The "DB::Exception: Too large array size" error in ClickHouse is raised when an array exceeds the maximum allowed number of elements. The TOO_LARGE_ARRAY_SIZE error code protects against excessive memory consumption that can occur when arrays grow to unreasonable sizes during query execution, data insertion, or function evaluation.
Impact
The operation that produces the oversized array is terminated immediately. This can affect queries using array-generating functions like groupArray(), arrayJoin() with large expansions, or range() with large arguments. Inserts containing oversized array values will also be rejected. In pipelines that depend on array operations, this error can block data processing entirely.
Common Causes
- Using
groupArray()without a size limit on high-cardinality groups, producing arrays with millions of elements - Calling
range(n)with a very large value ofn - Array concatenation or
arrayConcat()that accumulates elements beyond the limit - Data ingestion with array columns containing more elements than ClickHouse allows
arrayJoin()on large arrays creating an explosion of rows that triggers intermediate array size checks- Using
arrayMap()or similar higher-order functions that produce large intermediate arrays
Troubleshooting and Resolution Steps
Identify which operation produces the large array. The error message usually indicates the function or context. Review your query for array-generating operations.
Limit
groupArray()output by using the size-limited variant:-- Instead of: SELECT groupArray(value) FROM my_table GROUP BY key; -- Use: SELECT groupArray(1000)(value) FROM my_table GROUP BY key;Check
range()arguments:-- This will fail if n is very large: SELECT range(n) FROM my_table; -- Add a safety check: SELECT if(n <= 10000, range(n), []) FROM my_table;Use sampling or aggregation instead of collecting all values into arrays:
-- Instead of collecting all values: SELECT groupArray(user_id) FROM events GROUP BY event_type; -- Consider using approximate functions: SELECT groupArraySample(100)(user_id) FROM events GROUP BY event_type;Validate input data before insertion to ensure array columns do not contain excessive elements:
SELECT max(length(my_array_column)) FROM my_staging_table;Adjust the limit if large arrays are genuinely needed. The
max_block_sizeand related settings can indirectly affect array size limits. For some functions, there are specific settings:SET function_range_max_elements_in_block = 500000000;
Best Practices
- Always use the size-limited variants of array aggregation functions (
groupArray(N)) in production queries. - Validate array sizes in data before loading it into ClickHouse.
- Prefer aggregation over collection: use
count(),uniq(), orquantile()instead of collecting raw values into arrays when possible. - Set appropriate limits on
range()and similar functions, especially when arguments come from data rather than constants. - Monitor queries that use
arrayJoin()on large arrays, as they can produce an enormous number of output rows.
Frequently Asked Questions
Q: What is the maximum array size in ClickHouse?
A: The limit depends on the context. For range(), the default maximum is controlled by function_range_max_elements_in_block. For general arrays, practical limits depend on available memory and the specific operation. Arrays with millions of elements are generally problematic.
Q: Can I store large arrays in ClickHouse tables?
A: Technically yes, but it is not recommended. Large arrays consume significant memory during reads and processing. If you need to store large collections of values, consider using a separate table with one row per element and a foreign key.
Q: Does groupArray() without a limit always collect all values?
A: Yes. groupArray() without a size parameter collects every value in the group into a single array. For groups with millions of rows, this produces massive arrays. Always use groupArray(N) with a reasonable upper bound.
Q: How does this error differ from memory limit errors?
A: TOO_LARGE_ARRAY_SIZE is checked at the array element count level, before the array fully materializes in memory. A memory limit error would occur after the allocation attempt. The array size check catches the problem earlier and provides a more specific error message.