Regular expression functions
All of the regular expression functions use the java pattern syntax, with a few notable exceptions:
-
When using multi-line mode (enabled via the
(?m)
flag), only\n
is recognized as a line terminator. Additionally, the(?d)
flag is not supported and must not be used. -
Case-insensitive matching (enabled via the
(?i)
flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported. Additionally, the(?u)
flag is not supported and must not be used. -
Surrogate pairs are not supported. For example,
\uD800\uDC00
is not treated asU+10000
and must be specified as\x{10000}
. -
Boundaries (
\b
) are incorrectly handled for a non-spacing mark without a base character. -
\Q
and\E
are not supported in character classes (such as[A-Z123]
) and are instead treated as literals. -
Unicode character classes (
\p{prop}
) are supported with the following differences:-
All underscores in names must be removed. For example, use
OldItalic
instead ofOld_Italic
. -
Scripts must be specified directly, without the
Is
,script=
orsc=
prefixes. Example:\p{Hiragana}
-
Blocks must be specified with the
In
prefix. Theblock=
andblk=
prefixes are not supported. Example:\p{Mongolian}
-
Categories must be specified directly, without the
Is
,general_category=
orgc=
prefixes. Example:\p{L}
-
Binary properties must be specified directly, without the
Is
. Example:\p{NoncharacterCodePoint}
-
regexp_count
Returns the number of occurrence of pattern
in string
:
regexp_extract_all
Returns the substring(s) matched by the regular expression pattern
in string
:
Finds all occurrences of the regular expression pattern
in string
and returns the capturing group number group
:
regexp_extract
Returns the first substring matched by the regular expression pattern
in string
:
Finds the first occurrence of the regular expression pattern
in
string
and returns the capturing group number group
:
regexp_like
Evaluates the regular expression pattern
and determines if it is
contained within string
.
The pattern
only needs to be contained within
string
, rather than needing to match all of string
. In other words,
this performs a contains operation rather than a match operation. You can
match the entire string by anchoring the pattern using ^
and $
:
regexp_position
Returns the index of the first occurrence (counting from 1) of pattern
in string
.
Returns -1 if not found:
regexp_position
Returns the index of the first occurrence of pattern
in string
,
starting from start
(include start
). Returns -1 if not found:
Returns the index of the nth occurrence
of pattern
in string
,
starting from start
(include start
). Returns -1 if not found:
regexp_replace
Removes every instance of the substring matched by the regular expression
pattern
from string
:
Replaces every instance of the substring matched by the regular expression
pattern
in string
with replacement
. Capturing groups can be
referenced in replacement
using $g
for a numbered group or
${name}
for a named group. A dollar sign ($
) may be included in the
replacement by escaping it with a backslash (\$
):
Replaces every instance of the substring matched by the regular expression
pattern
in string
using function
. The lambda expression
function
is invoked for each match with the capturing groups passed as an
array. Capturing group numbers start at one; there is no group for the entire match
(if you need this, surround the entire expression with parenthesis).
regexp_split
Splits string
using the regular expression pattern
and returns an
array. Trailing empty strings are preserved: