ClickHouse groupUniqArray Function

The groupUniqArray function in ClickHouse is an aggregation function that collects unique elements from a column and returns them as an array. It's particularly useful when you need to gather distinct values within each group in a GROUP BY query.

Syntax

groupUniqArray(x)

Official Documentation

Example usage

SELECT 
    category,
    groupUniqArray(product_name) AS unique_products
FROM 
    products
GROUP BY 
    category;

This query will return an array of unique product names for each category.

Common issues

  • Large result sets may consume significant memory, especially with high-cardinality columns.
  • The order of elements in the resulting array is not guaranteed.

Best practices

  • Use groupUniqArray when you need to preserve the actual values, not just count them.
  • Consider using groupArray with DISTINCT if you need to maintain the original order of appearance.
  • For very large datasets, consider using approximate functions like groupUniqArrayIf with sampling if absolute precision is not required.

Frequently Asked Questions

Q: What's the difference between groupUniqArray and groupArray(DISTINCT ...)?
A: While both return unique elements, groupUniqArray is generally more efficient as it eliminates duplicates during aggregation. groupArray(DISTINCT ...) first collects all elements and then removes duplicates.

Q: Can groupUniqArray be used with multiple columns?
A: No, groupUniqArray operates on a single column. For multiple columns, you might need to combine them first or use separate groupUniqArray calls for each column.

Q: Is there a limit to the size of the array returned by groupUniqArray?
A: There's no specific limit, but large arrays can consume significant memory. It's important to monitor memory usage, especially with high-cardinality columns.

Q: How does groupUniqArray handle NULL values?
A: groupUniqArray typically ignores NULL values and doesn't include them in the resulting array.

Q: Is the output of groupUniqArray sorted?
A: No, the order of elements in the resulting array is not guaranteed. If you need a sorted output, you'll need to sort the array after aggregation.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.