Synopsis
from_item is one of
join_type is one of
grouping_element is one of
Description
Retrieve rows from zero or more tables.WITH FUNCTION clause
TheWITH FUNCTION clause allows you to define a list of inline SQL routines
that are available for use in the rest of the query.
The following example declares and uses two inline routines:
WITH clause
TheWITH clause defines named relations for use within a query.
It allows flattening nested queries or simplifying subqueries.
For example, the following queries are equivalent:
WITH clause can chain:
Currently, the SQL for the
WITH clause will be inlined anywhere the named
relation is used. This means that if the relation is used more than once and the query
is non-deterministic, the results may be different each time.WITH RECURSIVE clause
TheWITH RECURSIVE clause is a variant of the WITH clause. It defines
a list of queries to process, including recursive processing of suitable
queries.
This feature is experimental only. Proceed to use it only if you understand
potential query failures and the impact of the recursion processing on your
workload.
WITH-query must be shaped as a UNION of two relations. The
first relation is called the recursion base, and the second relation is called
the recursion step. Trino supports recursive WITH-queries with a single
recursive reference to a WITH-query from within the query. The name T of
the query T can be mentioned once in the FROM clause of the recursion
step relation.
The following listing shows a simple example, that displays a commonly used
form of a single query in the list:
VALUES (1) defines the
recursion base relation. SELECT n + 1 FROM t WHERE n < 4 defines the
recursion step relation. The recursion processing performs these steps:
- recursive base yields
1 - first recursion yields
1 + 1 = 2 - second recursion uses the result from the first and adds one:
2 + 1 = 3 - third recursion uses the result from the second and adds one again:
3 + 1 = 4 - fourth recursion aborts since
n = 4 - this results in
thaving values1,2,3and4 - the final statement performs the sum operation of these elements with the
final result value
10
RECURSIVE clause applies to all queries in the WITH list, but not
all of them must be recursive. If a WITH-query is not shaped according to
the rules mentioned above or it does not contain a recursive reference, it is
processed like a regular WITH-query. Column aliases are mandatory for all
the queries in the recursive WITH list.
The following limitations apply as a result of following the SQL standard and
due to implementation choices, in addition to WITH clause limitations:
- only single-element recursive cycles are supported. Like in regular
WITH-queries, references to previous queries in theWITHlist are allowed. References to following queries are forbidden. - usage of outer joins, set operations, limit clause, and others is not always allowed in the step relation
- recursion depth is fixed, defaults to
10, and doesn’t depend on the actual query results
SELECT clause
TheSELECT clause specifies the output of the query. Each select_expression
defines a column or columns to be included in the result.
ALL and DISTINCT quantifiers determine whether duplicate rows
are included in the result set. If the argument ALL is specified,
all rows are included. If the argument DISTINCT is specified, only unique
rows are included in the result set. In this case, each output column must
be of a type that allows comparison. If neither argument is specified,
the behavior defaults to ALL.
Select expressions
Eachselect_expression must be in one of the following forms:
expression [ [ AS ] column_alias ], a single output column
is defined.
In the case of row_expression.* [ AS ( column_alias [, ...] ) ],
the row_expression is an arbitrary expression of type ROW.
All fields of the row define output columns to be included in the result set.
In the case of relation.*, all columns of relation are included
in the result set. In this case column aliases are not allowed.
In the case of *, all columns of the relation defined by the query
are included in the result set.
In the result set, the order of columns is the same as the order of their
specification by the select expressions. If a select expression returns multiple
columns, they are ordered the same way they were ordered in the source
relation or row type expression.
If column aliases are specified, they override any preexisting column
or row field names:
GROUP BY clause
TheGROUP BY clause divides the output of a SELECT statement into
groups of rows containing matching values. A simple GROUP BY clause may
contain any expression composed of input columns or it may be an ordinal
number selecting an output column by position (starting at one).
The following queries are equivalent. They both group the output by
the nationkey input column with the first query using the ordinal
position of the output column and the second query using the input
column name:
GROUP BY clauses can group output by input column names not appearing in
the output of a select statement. For example, the following query generates
row counts for the customer table using the input column mktsegment:
GROUP BY clause is used in a SELECT statement all output
expressions must be either aggregate functions or columns present in
the GROUP BY clause.
Complex grouping operations
Trino also supports complex aggregations using theGROUPING SETS, CUBE
and ROLLUP syntax. This syntax allows users to perform analysis that requires
aggregation on multiple sets of columns in a single query. Complex grouping
operations do not support grouping on expressions composed of input columns.
Only column names are allowed.
Complex grouping operations are often equivalent to a UNION ALL of simple
GROUP BY expressions, as shown in the following examples. This equivalence
does not apply, however, when the source of data for the aggregation
is non-deterministic.
GROUPING SETS
Grouping sets allow users to specify multiple lists of columns to group on. The columns not part of a given sublist of grouping columns are set toNULL.
GROUPING SETS semantics are demonstrated by this example query:
UNION ALL of
multiple GROUP BY queries:
GROUPING SETS, CUBE
or ROLLUP) will only read from the underlying data source once, while the
query with the UNION ALL reads the underlying data three times. This is why
queries with a UNION ALL may produce inconsistent results when the data
source is not deterministic.
CUBE
TheCUBE operator generates all possible grouping sets (i.e. a power set)
for a given set of columns. For example, the query:
ROLLUP
TheROLLUP operator generates all possible subtotals for a given set of
columns. For example, the query:
Combining multiple grouping expressions
Multiple grouping expressions in the same query are interpreted as having cross-product semantics. For example, the following query:ALL and DISTINCT quantifiers determine whether duplicate grouping
sets each produce distinct output rows. This is particularly useful when
multiple complex grouping sets are combined in the same query. For example, the
following query:
DISTINCT quantifier for the GROUP BY:
ALL.
GROUPING operation
grouping(col1, ..., colN) -> bigint
The grouping operation returns a bit set converted to decimal, indicating which columns are present in a
grouping. It must be used in conjunction with GROUPING SETS, ROLLUP, CUBE or GROUP BY
and its arguments must match exactly the columns referenced in the corresponding GROUPING SETS,
ROLLUP, CUBE or GROUP BY clause.
To compute the resulting bit set for a particular row, bits are assigned to the argument columns with
the rightmost column being the least significant bit. For a given grouping, a bit is set to 0 if the
corresponding column is included in the grouping and to 1 otherwise. For example, consider the query
below:
origin_state column and excludes
the origin_zip and destination_state columns. The bit set constructed for that grouping
is 011 where the most significant bit represents origin_state.
HAVING clause
TheHAVING clause is used in conjunction with aggregate functions and
the GROUP BY clause to control which groups are selected. A HAVING
clause eliminates groups that do not satisfy the given conditions.
HAVING filters groups after groups and aggregates are computed.
The following example queries the customer table and selects groups
with an account balance greater than the specified value:
WINDOW clause
TheWINDOW clause is used to define named window specifications. The defined named
window specifications can be referred to in the SELECT and ORDER BY clauses
of the enclosing query:
WINDOW clause can contain one or multiple named window
specifications of the form
- The existing window name, which refers to a named window specification in the
WINDOWclause. The window specification associated with the referenced name is the basis of the current specification. - The partition specification, which separates the input rows into different
partitions. This is analogous to how the
GROUP BYclause separates rows into different groups for aggregate functions. - The ordering specification, which determines the order in which input rows will be processed by the window function.
- The window frame, which specifies a sliding window of rows to be processed
by the function for a given row. If the frame is not specified, it defaults
to
RANGE UNBOUNDED PRECEDING, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This frame contains all rows from the start of the partition up to the last peer of the current row. In the absence ofORDER BY, all rows are considered peers, soRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWis equivalent toBETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. The window frame syntax supports additional clauses for row pattern recognition. If the row pattern recognition clauses are specified, the window frame for a particular row consists of the rows matched by a pattern starting from that row. Additionally, if the frame specifies row pattern measures, they can be called over the window, similarly to window functions.
existing window name, or from
another window specification in the reference chain. In case when there is no
existing window name specified, or none of the referenced window
specifications contains the component, the default value is used.
Set operations
UNION INTERSECT and EXCEPT are all set operations. These clauses are used
to combine the results of more than one select statement into a single result set:
ALL or DISTINCT controls which rows are included in
the final result set. If the argument ALL is specified all rows are
included even if the rows are identical. If the argument DISTINCT
is specified only unique rows are included in the combined result set.
If neither is specified, the behavior defaults to DISTINCT.
Multiple set operations are processed left to right, unless the order is explicitly
specified via parentheses. Additionally, INTERSECT binds more tightly
than EXCEPT and UNION. That means A UNION B INTERSECT C EXCEPT D
is the same as A UNION (B INTERSECT C) EXCEPT D.
UNION clause
UNION combines all the rows that are in the result set from the
first query with those that are in the result set for the second query.
The following is an example of one of the simplest possible UNION clauses.
It selects the value 13 and combines this result set with a second query
that selects the value 42:
UNION and UNION ALL.
It selects the value 13 and combines this result set with a second query that
selects the values 42 and 13:
INTERSECT clause
INTERSECT returns only the rows that are in the result sets of both the first and
the second queries. The following is an example of one of the simplest
possible INTERSECT clauses. It selects the values 13 and 42 and combines
this result set with a second query that selects the value 13. Since 42
is only in the result set of the first query, it is not included in the final results.:
EXCEPT clause
EXCEPT returns the rows that are in the result set of the first query,
but not the second. The following is an example of one of the simplest
possible EXCEPT clauses. It selects the values 13 and 42 and combines
this result set with a second query that selects the value 13. Since 13
is also in the result set of the second query, it is not included in the final result.:
ORDER BY clause
TheORDER BY clause is used to sort a result set by one or more
output expressions:
ORDER BY clause is evaluated after any GROUP BY or HAVING clause,
and before any OFFSET, LIMIT or FETCH FIRST clause.
The default null ordering is NULLS LAST, regardless of the ordering direction.
Note that, following the SQL specification, an ORDER BY clause only
affects the order of rows for queries that immediately contain the clause.
Trino follows that specification, and drops redundant usage of the clause to
avoid negative performance impacts.
In the following example, the clause only applies to the select statement.
ORDER BY clause in
this case does not result in any difference, but negatively impacts performance
of running the overall insert statement, Trino skips the sort operation.
Another example where the ORDER BY clause is redundant, and does not affect
the outcome of the overall statement, is a nested query:
OFFSET clause
TheOFFSET clause is used to discard a number of leading rows
from the result set:
ORDER BY clause is present, the OFFSET clause is evaluated
over a sorted result set, and the set remains sorted after the
leading rows are discarded:
OFFSET clause equals or exceeds the size
of the result set, the final result is empty.
(limit-clause)=
LIMIT or FETCH FIRST clause
TheLIMIT or FETCH FIRST clause restricts the number of rows
in the result set.
LIMIT clause
restricts the output to only have five rows (because the query lacks an ORDER BY,
exactly which rows are returned is arbitrary):
LIMIT ALL is the same as omitting the LIMIT clause.
The FETCH FIRST clause supports either the FIRST or NEXT keywords
and the ROW or ROWS keywords. These keywords are equivalent and
the choice of keyword has no effect on query execution.
If the count is not specified in the FETCH FIRST clause, it defaults to 1:
OFFSET clause is present, the LIMIT or FETCH FIRST clause
is evaluated after the OFFSET clause:
FETCH FIRST clause, the argument ONLY or WITH TIES
controls which rows are included in the result set.
If the argument ONLY is specified, the result set is limited to the exact
number of leading rows determined by the count.
If the argument WITH TIES is specified, it is required that the ORDER BY
clause be present. The result set consists of the same set of leading rows
and all of the rows in the same peer group as the last of them (‘ties’)
as established by the ordering in the ORDER BY clause. The result set is sorted:
TABLESAMPLE
There are multiple sample methods:BERNOULLI
: Each row is selected to be in the table sample with a probability of
the sample percentage. When a table is sampled using the Bernoulli
method, all physical blocks of the table are scanned and certain
rows are skipped (based on a comparison between the sample percentage
and a random value calculated at runtime).
The probability of a row being included in the result is independent
from any other row. This does not reduce the time required to read
the sampled table from disk. It may have an impact on the total
query time if the sampled output is processed further.
SYSTEM
: This sampling method divides the table into logical segments of data
and samples the table at this granularity. This sampling method either
selects all the rows from a particular segment of data or skips it
(based on a comparison between the sample percentage and a random
value calculated at runtime).
The rows selected in a system sampling will be dependent on which
connector is used. For example, when used with Hive, it is dependent
on how the data is laid out on HDFS. This method does not guarantee
independent sampling probabilities.
Neither of the two methods allow deterministic bounds on the number of rows returned.
UNNEST
UNNEST can optionally have a WITH ORDINALITY clause, in which case an additional ordinality column
is added to the end:
UNNEST returns zero entries when the array/map is empty:
UNNEST returns zero entries when the array/map is null:
UNNEST is normally used with a JOIN, and can reference columns
from relations on the left side of the join:
UNNEST can also be used with multiple arguments, in which case they are expanded into multiple columns,
with as many rows as the highest cardinality argument (the other columns are padded with nulls):
LEFT JOIN is preferable in order to avoid losing the the row containing the array/map field in question
when referenced columns from relations on the left side of the join can be empty or have NULL values:
LEFT JOIN the only condition supported by the current implementation is ON TRUE.
Joins
Joins allow you to combine data from multiple relations.CROSS JOIN
A cross join returns the Cartesian product (all combinations) of two relations. Cross joins can either be specified using the explitCROSS JOIN syntax or by specifying multiple relations in the
FROM clause.
Both of the following queries are equivalent:
nation table contains 25 rows and the region table contains 5 rows,
so a cross join between the two tables produces 125 rows:
LATERAL
Subqueries appearing in theFROM clause can be preceded by the keyword LATERAL.
This allows them to reference columns provided by preceding FROM items.
A LATERAL join can appear at the top level in the FROM list, or anywhere
within a parenthesized join tree. In the latter case, it can also refer to any items
that are on the left-hand side of a JOIN for which it is on the right-hand side.
When a FROM item contains LATERAL cross-references, evaluation proceeds as follows:
for each row of the FROM item providing the cross-referenced columns,
the LATERAL item is evaluated using that row set’s values of the columns.
The resulting rows are joined as usual with the rows they were computed from.
This is repeated for set of rows from the column source tables.
LATERAL is primarily useful when the cross-referenced column is necessary for
computing the rows to be joined:
Qualifying column names
When two relations in a join have columns with the same name, the column references must be qualified using the relation alias (if the relation has an alias), or with the relation name:Column 'name' is ambiguous:
Subqueries
A subquery is an expression which is composed of a query. The subquery is correlated when it refers to columns outside of the subquery. Logically, the subquery will be evaluated for each row in the surrounding query. The referenced columns will thus be constant during any single evaluation of the subquery.Support for correlated subqueries is limited. Not every standard form is supported.
EXISTS
TheEXISTS predicate determines if a subquery returns any rows:
IN
TheIN predicate determines if any values produced by the subquery
are equal to the provided expression. The result of IN follows the
standard rules for nulls. The subquery must produce exactly one column:
Scalar subquery
A scalar subquery is a non-correlated subquery that returns zero or one row. It is an error for the subquery to produce more than one row. The returned value isNULL if the subquery produces no rows:
Currently only single column can be returned from the scalar subquery.