ClickHouse dictionaries are usually described as a simple key -> attributes lookup, but arrays show up in three distinct places: a dictionary attribute can be an array, a lookup can run over an array of keys, and a dictionary key can be a composite tuple. Each is a different pattern with different functions and performance characteristics.
This guide covers all three, plus the practical patterns for batch enrichment, multi-value results, and hierarchical lookups that return arrays. For the basics of what a dictionary is, see what is a ClickHouse dictionary and what is an external dictionary.
Array Attributes: One Key, Many Values
A dictionary attribute can be an Array(T). The key stays scalar; the value is a list. This is the most common "arrays in dictionaries" case — for example, mapping a user ID to a list of role IDs, or a product to a list of tags.
CREATE TABLE arr_src
(
key UInt64,
array_int Array(Int64),
array_str Array(String)
) ENGINE = MergeTree ORDER BY key;
INSERT INTO arr_src VALUES (42, [10, 20, 30], ['admin', 'editor']);
CREATE DICTIONARY arr_dict
(
key UInt64,
array_int Array(Int64) DEFAULT [1, 2, 3],
array_str Array(String) DEFAULT ['none']
)
PRIMARY KEY key
SOURCE(CLICKHOUSE(TABLE 'arr_src'))
LIFETIME(120)
LAYOUT(HASHED());
Retrieval works exactly like a scalar attribute — the function just returns an array:
SELECT
dictGet('arr_dict', 'array_int', toUInt64(42)) AS res_int,
dictGetOrDefault('arr_dict', 'array_str', toUInt64(424242), ['none']) AS res_str;
-- res_int: [10,20,30] res_str: ['none']
dictGet returns the attribute's configured default (the DEFAULT clause, or the null value) when the key is missing; dictGetOrDefault lets you supply the fallback array inline at query time. Both work the same way for array and scalar attributes.
Array attributes are supported across the common layouts (flat, hashed, complex_key_hashed, and so on) and across source types. With a ClickHouse or PostgreSQL source the native array maps directly. With a MySQL source — which has no native array type — you typically store the value as a string and expose it via an ALIAS column that casts it, e.g. array_int Array(Int32) ALIAS cast(_array_int, 'Array(Int32)') in the supporting table.
Composite (Tuple) Keys
A dictionary key does not have to be a single UInt64. When you need to look up by a combination of columns — (country, device_type), (tenant_id, sku) — use a composite key with a complex_key_* layout.
CREATE DICTIONARY pricing_dict
(
country String,
sku UInt32,
price Decimal(10, 2),
tags Array(String)
)
PRIMARY KEY country, sku
SOURCE(CLICKHOUSE(TABLE 'pricing_src'))
LIFETIME(300)
LAYOUT(COMPLEX_KEY_HASHED());
At query time the key is passed as a tuple, in the same column order as the PRIMARY KEY:
SELECT dictGet('pricing_dict', 'price', tuple('US', toUInt32(1001)));
SELECT dictGet('pricing_dict', 'tags', ('DE', toUInt32(2002)));
Two layout choices matter here:
complex_key_hashed— the general-purpose composite-key layout.complex_key_hashed_array— a variant that stores each attribute in its own contiguous array instead of one hash map per row. It trades a little lookup overhead for substantially lower memory when a dictionary has many attributes (roughly three or more). Prefer it for wide reference tables, especially with String keys.
Note that a composite key and an array key are different things: the key is a fixed-width tuple of columns, not a variable-length Array. ClickHouse dictionaries do not key on an Array value directly — if you need that, hash or serialize the array into a scalar key column first.
Looking Up an Array of Keys (Batch Enrichment)
A very common need is enriching every element of an array column in one row — e.g. a tag_ids Array(UInt64) column where you want the corresponding tag names. Dictionary functions are scalar, so you apply them per element with arrayMap:
SELECT
tag_ids,
arrayMap(id -> dictGet('tag_dict', 'name', id), tag_ids) AS tag_names
FROM posts;
Because dictionary lookups are in-memory hash probes, this is cheap relative to a JOIN and stays within the row — no need to unnest. When you do want one output row per element (for grouping or filtering on the enriched value), combine the lookup with ARRAY JOIN:
SELECT
post_id,
tag_id,
dictGet('tag_dict', 'name', tag_id) AS tag_name
FROM posts
ARRAY JOIN tag_ids AS tag_id;
For deduplicating or counting enriched values, the array combinators help: arrayDistinct(arrayMap(...)) removes duplicate results before further processing. Keep an eye on intermediate array sizes — see array function memory usage for the cost model.
Functions That Return Arrays
Several dictionary functions return an Array regardless of the attribute type — useful for hierarchy traversal and regexp dictionaries.
| Function | Returns | Purpose |
|---|---|---|
dictGet / dictGetOrDefault |
attribute type (incl. Array(T)) |
Standard lookup; returns the array if the attribute is an array |
dictGetHierarchy(dict, key) |
Array(UInt64) |
The key plus all of its parents, walking a hierarchical attribute upward |
dictGetChildren(dict, key) |
Array(UInt64) |
First-level children of a key |
dictGetDescendants(dict, key, level) |
Array(UInt64) |
All descendants down to level (0 = unlimited) |
dictGetAll(dict, attr, key[, limit]) |
Array(T) |
All matching values; designed for regexp_tree dictionaries where one key can match multiple rules |
Hierarchical functions require the dictionary to declare a HIERARCHICAL attribute that points at each row's parent key:
SELECT
dictGetHierarchy('region_dict', toUInt64(5)) AS ancestors,
dictGetChildren('region_dict', toUInt64(1)) AS children,
dictGetDescendants('region_dict', toUInt64(1), 0) AS all_descendants;
dictGetChildren and dictGetDescendants are available since v21.4; dictGetAll since v23.5. dictGetHierarchy has existed since the early 1.1.x releases.
dictGetAll is specific to regexp_tree dictionaries: a single input string can satisfy several patterns, so instead of returning one winning value it returns the array of all matches (optionally capped by a limit argument). This is how user-agent parsing dictionaries return every matched OS/browser layer at once.
Best Practices
Use
Array(T)attributes for genuine one-to-many reference data. A user's roles, a product's tags, or a host's IP list are natural array attributes — much cleaner than encoding them as delimited strings and parsing on read.Reach for
complex_key_hashed_arrayon wide, String-keyed dictionaries. When a dictionary has several attributes and a non-UInt64key, this layout meaningfully reduces memory versuscomplex_key_hashed.Pass composite keys as a
tupleinPRIMARY KEYorder. A mismatch in order or type is the most common cause of wrong or missing results withcomplex_key_*dictionaries.Prefer
arrayMap(dictGet(...))over a JOIN for per-element enrichment. It keeps the work in-memory and per-row, and avoids exploding the result set. OnlyARRAY JOINwhen you actually need one row per element.Set a realistic
LIFETIMEand size your memory. Array attributes and wide hierarchies enlarge the in-memory footprint; confirm load status and memory insystem.dictionariesbefore relying on a dictionary in hot queries.
Common Issues
Wrong tuple arity or type for composite keys.
dictGet('d', 'a', ('US', 1001))will not match if the dictionary key is(String, UInt32)and you pass a plainInt. Cast explicitly:('US', toUInt32(1001)).Expecting
dictGetto return all matches. Forregexp_treedictionariesdictGetreturns only the deepest (most specific) match; usedictGetAllto get the full array.Cannot read array from texton load. When a non-ClickHouse source stores arrays as text, the source string must be valid ClickHouse array syntax (e.g.[1,2,3]) or be cast via anALIAScolumn. See Cannot read array from text.Hierarchy functions returning empty arrays.
dictGetHierarchy/dictGetChildrenneed aHIERARCHICALattribute defined; without it they have nothing to traverse.Refresh lag on array attributes. Updated arrays only appear after the dictionary reloads. For controlled refresh patterns see the dictionary enrichment update pattern and, for one-off rewrites, backfilling a column with a dictionary lookup.
How Pulse Helps
Array-valued and composite-key dictionaries are easy to get subtly wrong: a dictionary that quietly falls back to defaults, an arrayMap(dictGet(...)) that re-probes a cold dictionary on every row, or a memory-heavy complex_key_hashed that should have been complex_key_hashed_array. Pulse monitors dictionary load state, memory footprint, and refresh failures across your ClickHouse fleet, and surfaces the query patterns where dictionary lookups dominate runtime — so enrichment stays fast and correct as your reference data grows. It is a support and monitoring layer run by engineers who operate ClickHouse in production, not a replacement for your cluster.
Frequently Asked Questions
Q: Can a ClickHouse dictionary attribute be an array?
Yes. Declare the attribute as Array(T) (e.g. Array(Int64), Array(String)) with an optional DEFAULT array. dictGet and dictGetOrDefault return the array directly. This is supported across the common layouts and ClickHouse/PostgreSQL sources natively.
Q: Can I use an array as the dictionary key?
Not directly — dictionary keys are either a single scalar (UInt64) or a fixed composite tuple with a complex_key_* layout. A variable-length Array cannot be the key. If you must key on an array, serialize or hash it into a scalar column first.
Q: How do I enrich every element of an array column?
Wrap the lookup in arrayMap: arrayMap(id -> dictGet('d', 'attr', id), id_array). This returns a parallel array of results per row. Use ARRAY JOIN instead when you need one output row per element for grouping or filtering.
Q: What is the difference between dictGet and dictGetAll?
dictGet returns a single value — for regexp_tree dictionaries this is the deepest (most specific) matching node (which may itself be an array attribute). dictGetAll (v23.5+) returns an Array of all matching values and is meant for regexp_tree dictionaries, where one input can match several patterns.
Q: Which layout should I use for a String composite key with many attributes?
complex_key_hashed_array — it stores attributes in contiguous arrays rather than per-row maps, cutting memory for wide dictionaries. Use plain complex_key_hashed for narrow ones. Pass the key as a tuple in PRIMARY KEY column order.
Q: Why do hierarchy functions like dictGetDescendants return empty arrays?
The dictionary must declare a HIERARCHICAL attribute pointing at each row's parent key. Without it there is no parent/child relationship to traverse, so the functions return empty results.