approx_distinct
function using the
HyperLogLog data structure.
HyperLogLog
to P4HyperLogLog
:
varbinary
. This
allows them to be stored for later use. Combined with the ability to merge
multiple sketches, this allows one to calculate approx_distinct
of the
elements of a partition of a query, then for the entirety of a query with very
little cost.
For example, calculating the HyperLogLog
for daily unique users will allow
weekly or monthly unique users to be calculated incrementally by combining the
dailies. This is similar to computing weekly revenue by summing daily revenue.
Uses of approx_distinct
with GROUPING SETS
can be converted to use
HyperLogLog
. Examples:
HyperLogLog
sketch of the input data set of x
. This
data sketch underlies approx_distinct
and can be stored and
used later by calling cardinality()
.
approx_distinct
on the data summarized by the
hll
HyperLogLog data sketch.
HyperLogLog
.
HyperLogLog
of the aggregate union of the individual hll
HyperLogLog structures.