The "DB::Exception: Sizes of arrays don't match" error in ClickHouse is raised when you call a function that operates on multiple arrays simultaneously, but the input arrays have different lengths. Functions like arrayMap, arrayFilter, arrayZip, and higher-order array functions require all input arrays to have the same number of elements. The error code is SIZES_OF_ARRAYS_DONT_MATCH.
Impact
The query fails during execution. If the mismatch occurs only in certain rows, the query may fail partway through processing. No partial results are returned in the default error mode.
Common Causes
- Parallel arrays with inconsistent lengths -- storing related data in separate array columns (e.g.,
keys Array(String)andvalues Array(String)) where some rows have mismatched lengths. - Using arrayMap/arrayFilter on arrays from different sources -- combining arrays that were not designed to be processed together.
- Data ingestion errors -- ETL processes that populate array columns independently, leading to length mismatches.
- Nested column inconsistency -- Nested type columns are internally stored as parallel arrays, and corruption or incorrect inserts can cause length mismatches.
- Joining tables with arrays and combining them -- arrays from different tables may not have matching lengths.
Troubleshooting and Resolution Steps
Identify rows with mismatched array lengths:
SELECT id, length(keys) AS keys_len, length(values) AS values_len FROM your_table WHERE length(keys) != length(values) LIMIT 10;Fix the data by truncating or padding arrays to match lengths:
-- Truncate to the shorter length SELECT arrayResize(keys, least(length(keys), length(values))) AS keys_fixed, arrayResize(values, least(length(keys), length(values))) AS values_fixed FROM your_table; -- Or pad the shorter array with defaults SELECT arrayResize(keys, greatest(length(keys), length(values)), '') AS keys_padded, arrayResize(values, greatest(length(keys), length(values)), '') AS values_padded FROM your_table;Guard against the error with a WHERE filter:
SELECT arrayMap((k, v) -> (k, v), keys, values) FROM your_table WHERE length(keys) = length(values);For arrayZip, ensure all arrays have equal length:
-- This will fail if arrays have different lengths SELECT arrayZip(names, scores) FROM students; -- Safe version with filter SELECT arrayZip(names, scores) FROM students WHERE length(names) = length(scores);Fix ingestion to ensure consistent array lengths:
-- Validate during insert using a check INSERT INTO your_table SELECT keys, values FROM source_table WHERE length(keys) = length(values);For Nested columns, verify consistency:
SELECT name, length(nested.key) AS key_len, length(nested.value) AS val_len FROM your_table WHERE length(nested.key) != length(nested.value);
Best Practices
- Use the
Nesteddata type when storing related parallel arrays, as ClickHouse enforces length consistency for Nested columns during inserts (whenflatten_nested = 0). - Validate array lengths at ingestion time to prevent mismatches from entering the table.
- When working with multiple arrays in queries, always check lengths match before applying higher-order functions.
- Consider using
Tuplearrays (e.g.,Array(Tuple(String, String))) instead of parallel arrays to keep related data together and avoid length mismatch issues entirely. - Add length consistency checks to your data quality monitoring pipeline.
Frequently Asked Questions
Q: Which functions require arrays to have matching lengths?
A: Higher-order functions that process multiple arrays element-wise require matching lengths: arrayMap, arrayFilter, arrayFirst, arrayExists, arrayAll, arraySum (with lambda), arrayZip, and similar functions. Functions that operate on arrays independently (like arrayConcat) do not have this requirement.
Q: Can I use Map type instead of parallel arrays?
A: Yes. ClickHouse's Map(K, V) type stores key-value pairs together, eliminating the possibility of length mismatches. If your use case is key-value data, Map is generally a better choice than parallel arrays.
Q: Does this error occur during INSERT or only during SELECT?
A: It typically occurs during SELECT when array functions are evaluated. However, if you have materialized columns or constraints that use array functions, it could also surface during INSERT.
Q: How do I handle this error in a materialized view?
A: Add a WHERE clause in the materialized view's SELECT to filter out rows with mismatched array lengths. This prevents the error from blocking inserts into the source table.