The groupUniqArray
function in ClickHouse is an aggregation function that collects unique elements from a column and returns them as an array. It's particularly useful when you need to gather distinct values within each group in a GROUP BY query.
Syntax
groupUniqArray(x)
Example usage
SELECT
category,
groupUniqArray(product_name) AS unique_products
FROM
products
GROUP BY
category;
This query will return an array of unique product names for each category.
Common issues
- Large result sets may consume significant memory, especially with high-cardinality columns.
- The order of elements in the resulting array is not guaranteed.
Best practices
- Use
groupUniqArray
when you need to preserve the actual values, not just count them. - Consider using
groupArray
withDISTINCT
if you need to maintain the original order of appearance. - For very large datasets, consider using approximate functions like
groupUniqArrayIf
with sampling if absolute precision is not required.
Frequently Asked Questions
Q: What's the difference between groupUniqArray
and groupArray(DISTINCT ...)
?
A: While both return unique elements, groupUniqArray
is generally more efficient as it eliminates duplicates during aggregation. groupArray(DISTINCT ...)
first collects all elements and then removes duplicates.
Q: Can groupUniqArray
be used with multiple columns?
A: No, groupUniqArray
operates on a single column. For multiple columns, you might need to combine them first or use separate groupUniqArray
calls for each column.
Q: Is there a limit to the size of the array returned by groupUniqArray
?
A: There's no specific limit, but large arrays can consume significant memory. It's important to monitor memory usage, especially with high-cardinality columns.
Q: How does groupUniqArray
handle NULL values?
A: groupUniqArray
typically ignores NULL values and doesn't include them in the resulting array.
Q: Is the output of groupUniqArray
sorted?
A: No, the order of elements in the resulting array is not guaranteed. If you need a sorted output, you'll need to sort the array after aggregation.