String operators
The||
operator performs concatenation.
The LIKE
statement can be used for pattern matching and is documented in
Pattern Comparison: LIKE
String functions
These functions assume that the input strings contain valid UTF-8 encoded
Unicode code points. There are no explicit checks for valid UTF-8 and
the functions may return incorrect results on invalid UTF-8.
Invalid UTF-8 data can be corrected with from_utf8().Additionally, the functions operate on Unicode code points and not user
visible characters (or grapheme clusters). Some languages combine
multiple code points into a single user-perceived character, the basic
unit of a writing system for a language, but the functions will treat each
code point as a separate unit.The lower() and upper() functions do not perform
locale-sensitive, context-sensitive, or one-to-many mappings required for
some languages. Specifically, this will return incorrect results for
Lithuanian, Turkish and Azeri.
chr
n
as a single character string.
codepoint
string
.
concat
string1
, string2
, ...
, stringN
.
This function provides the same functionality as the
SQL-standard concatenation operator (||
).
concat_ws
string1
, string2
, ...
, stringN
using string0
as a separator. If string0
is null, then the return
value is null. Any null values provided in the arguments after the
separator are skipped.
string0
as a
separator. If string0
is null, then the return value is null. Any
null values in the array are skipped.
format
hamming_distance
string1
and string2
,
i.e. the number of positions at which the corresponding characters are different.
Note that the two strings must have the same length.
length
string
in characters.
levenshtein_distance
string1
and string2
,
i.e. the minimum number of single-character edits (insertions,
deletions or substitutions) needed to change string1
into string2
.
lower
string
to lowercase.
lpad
string
to size
characters with padstring
.
If size
is less than the length of string
, the result is
truncated to size
characters. size
must not be negative
and padstring
must be non-empty.
ltrim
string
.
luhn_check
string
of digits is valid according to the
Luhn algorithm.
This checksum function, also known as modulo 10
or mod 10
, is
widely applied on credit card numbers and government identification numbers
to distinguish valid numbers from mistyped, incorrect numbers.
Valid identification number:
position
substring
in
string
. Positions start with 1
. If not found, 0
is returned.
This SQL-standard function has special syntax and uses the
IN
keyword for the arguments. See also strpos().replace
search
from string
.
search
with replace
in string
.
reverse
string
with the characters in reverse order.
rpad
string
to size
characters with padstring
.
If size
is less than the length of string
, the result is
truncated to size
characters. size
must not be negative
and padstring
must be non-empty.
rtrim
string
.
soundex
soundex
returns a character string containing the phonetic representation of char
.
: It is typically used to evaluate the similarity of two expressions phonetically, that is
how the string sounds when spoken:
split
string
on delimiter
and returns an array.
string
on delimiter
and returns an array of size at most
limit
. The last element in the array always contain everything
left in the string
. limit
must be a positive number.
split_part
string
on delimiter
and returns the field index
.
Field indexes start with 1
. If the index is larger than
the number of fields, then null is returned.
split_to_map
string
by entryDelimiter
and keyValueDelimiter
and returns a map.
entryDelimiter
splits string
into key-value pairs. keyValueDelimiter
splits
each pair into key and value.
split_to_multimap
string
by entryDelimiter
and keyValueDelimiter
and returns a map
containing an array of values for each unique key. entryDelimiter
splits string
into key-value pairs. keyValueDelimiter
splits each pair into key and value. The
values for each key will be in the same order as they appeared in string
.
strpos
substring
in
string
. Positions start with 1
. If not found, 0
is returned.
instance
of substring
in string
.
When instance
is a negative number the search will start from the end of string
.
Positions start with 1
. If not found, 0
is returned.
starts_with
substring
is a prefix of string
.
substr
substring
string
from the starting position start
.
Positions start with 1
. A negative starting position is interpreted
as being relative to the end of the string.
string
of length length
from the starting
position start
. Positions start with 1
. A negative starting
position is interpreted as being relative to the end of the string.
translate
source
string translated by replacing characters found in the
from
string with the corresponding characters in the to
string. If the from
string contains duplicates, only the first is used. If the source
character
does not exist in the from
string, the source
character will be copied
without translation. If the index of the matching character in the from
string is beyond the length of the to
string, the source
character will
be omitted from the resulting string.
Here are some examples illustrating the translate function:
trime
string
.
string
from source
:
upper
string
to uppercase.
word_stem
word
in the English language.
word
in the lang
language.
Unicode functions
normalize
string
with the specified normalization form.
form
must be one of the following keywords:
Form | Description |
---|---|
NFD | Canonical Decomposition |
NFC | Canonical Decomposition, followed by Canonical Composition |
NFKD | Compatibility Decomposition |
NFKC | Compatibility Decomposition, followed by Canonical Composition |
This SQL-standard function has special syntax and requires
specifying
form
as a keyword, not as a string.to_utf8
string
into a UTF-8 varbinary representation.
from_utf8
binary
. Invalid UTF-8 sequences
are replaced with the Unicode replacement character U+FFFD
.
binary
. Invalid UTF-8 sequences
are replaced with replace
. The replacement string replace
must either
be a single character or empty (in which case invalid characters are
removed).