Dictionary Lookups with Array Keys and Multi-Value Results

ClickHouse dictionaries are usually described as a simple key -> attributes lookup, but arrays show up in three distinct places: a dictionary attribute can be an array, a lookup can run over an array of keys, and a dictionary key can be a composite tuple. Each is a different pattern with different functions and performance characteristics.

This guide covers all three, plus the practical patterns for batch enrichment, multi-value results, and hierarchical lookups that return arrays. For the basics of what a dictionary is, see what is a ClickHouse dictionary and what is an external dictionary.

Array Attributes: One Key, Many Values

A dictionary attribute can be an Array(T). The key stays scalar; the value is a list. This is the most common "arrays in dictionaries" case — for example, mapping a user ID to a list of role IDs, or a product to a list of tags.

CREATE TABLE arr_src
(
    key UInt64,
    array_int Array(Int64),
    array_str Array(String)
) ENGINE = MergeTree ORDER BY key;

INSERT INTO arr_src VALUES (42, [10, 20, 30], ['admin', 'editor']);

CREATE DICTIONARY arr_dict
(
    key UInt64,
    array_int Array(Int64) DEFAULT [1, 2, 3],
    array_str Array(String) DEFAULT ['none']
)
PRIMARY KEY key
SOURCE(CLICKHOUSE(TABLE 'arr_src'))
LIFETIME(120)
LAYOUT(HASHED());

Retrieval works exactly like a scalar attribute — the function just returns an array:

SELECT
    dictGet('arr_dict', 'array_int', toUInt64(42))                          AS res_int,
    dictGetOrDefault('arr_dict', 'array_str', toUInt64(424242), ['none'])   AS res_str;
-- res_int: [10,20,30]   res_str: ['none']

dictGet returns the attribute's configured default (the DEFAULT clause, or the null value) when the key is missing; dictGetOrDefault lets you supply the fallback array inline at query time. Both work the same way for array and scalar attributes.

Array attributes are supported across the common layouts (flat, hashed, complex_key_hashed, and so on) and across source types. With a ClickHouse or PostgreSQL source the native array maps directly. With a MySQL source — which has no native array type — you typically store the value as a string and expose it via an ALIAS column that casts it, e.g. array_int Array(Int32) ALIAS cast(_array_int, 'Array(Int32)') in the supporting table.

Composite (Tuple) Keys

A dictionary key does not have to be a single UInt64. When you need to look up by a combination of columns — (country, device_type), (tenant_id, sku) — use a composite key with a complex_key_* layout.

CREATE DICTIONARY pricing_dict
(
    country  String,
    sku      UInt32,
    price    Decimal(10, 2),
    tags     Array(String)
)
PRIMARY KEY country, sku
SOURCE(CLICKHOUSE(TABLE 'pricing_src'))
LIFETIME(300)
LAYOUT(COMPLEX_KEY_HASHED());

At query time the key is passed as a tuple, in the same column order as the PRIMARY KEY:

SELECT dictGet('pricing_dict', 'price', tuple('US', toUInt32(1001)));
SELECT dictGet('pricing_dict', 'tags',  ('DE', toUInt32(2002)));

Two layout choices matter here:

  • complex_key_hashed — the general-purpose composite-key layout.
  • complex_key_hashed_array — a variant that stores each attribute in its own contiguous array instead of one hash map per row. It trades a little lookup overhead for substantially lower memory when a dictionary has many attributes (roughly three or more). Prefer it for wide reference tables, especially with String keys.

Note that a composite key and an array key are different things: the key is a fixed-width tuple of columns, not a variable-length Array. ClickHouse dictionaries do not key on an Array value directly — if you need that, hash or serialize the array into a scalar key column first.

Looking Up an Array of Keys (Batch Enrichment)

A very common need is enriching every element of an array column in one row — e.g. a tag_ids Array(UInt64) column where you want the corresponding tag names. Dictionary functions are scalar, so you apply them per element with arrayMap:

SELECT
    tag_ids,
    arrayMap(id -> dictGet('tag_dict', 'name', id), tag_ids) AS tag_names
FROM posts;

Because dictionary lookups are in-memory hash probes, this is cheap relative to a JOIN and stays within the row — no need to unnest. When you do want one output row per element (for grouping or filtering on the enriched value), combine the lookup with ARRAY JOIN:

SELECT
    post_id,
    tag_id,
    dictGet('tag_dict', 'name', tag_id) AS tag_name
FROM posts
ARRAY JOIN tag_ids AS tag_id;

For deduplicating or counting enriched values, the array combinators help: arrayDistinct(arrayMap(...)) removes duplicate results before further processing. Keep an eye on intermediate array sizes — see array function memory usage for the cost model.

Functions That Return Arrays

Several dictionary functions return an Array regardless of the attribute type — useful for hierarchy traversal and regexp dictionaries.

Function Returns Purpose
dictGet / dictGetOrDefault attribute type (incl. Array(T)) Standard lookup; returns the array if the attribute is an array
dictGetHierarchy(dict, key) Array(UInt64) The key plus all of its parents, walking a hierarchical attribute upward
dictGetChildren(dict, key) Array(UInt64) First-level children of a key
dictGetDescendants(dict, key, level) Array(UInt64) All descendants down to level (0 = unlimited)
dictGetAll(dict, attr, key[, limit]) Array(T) All matching values; designed for regexp_tree dictionaries where one key can match multiple rules

Hierarchical functions require the dictionary to declare a HIERARCHICAL attribute that points at each row's parent key:

SELECT
    dictGetHierarchy('region_dict', toUInt64(5))      AS ancestors,
    dictGetChildren('region_dict', toUInt64(1))       AS children,
    dictGetDescendants('region_dict', toUInt64(1), 0) AS all_descendants;

dictGetChildren and dictGetDescendants are available since v21.4; dictGetAll since v23.5. dictGetHierarchy has existed since the early 1.1.x releases.

dictGetAll is specific to regexp_tree dictionaries: a single input string can satisfy several patterns, so instead of returning one winning value it returns the array of all matches (optionally capped by a limit argument). This is how user-agent parsing dictionaries return every matched OS/browser layer at once.

Best Practices

  1. Use Array(T) attributes for genuine one-to-many reference data. A user's roles, a product's tags, or a host's IP list are natural array attributes — much cleaner than encoding them as delimited strings and parsing on read.

  2. Reach for complex_key_hashed_array on wide, String-keyed dictionaries. When a dictionary has several attributes and a non-UInt64 key, this layout meaningfully reduces memory versus complex_key_hashed.

  3. Pass composite keys as a tuple in PRIMARY KEY order. A mismatch in order or type is the most common cause of wrong or missing results with complex_key_* dictionaries.

  4. Prefer arrayMap(dictGet(...)) over a JOIN for per-element enrichment. It keeps the work in-memory and per-row, and avoids exploding the result set. Only ARRAY JOIN when you actually need one row per element.

  5. Set a realistic LIFETIME and size your memory. Array attributes and wide hierarchies enlarge the in-memory footprint; confirm load status and memory in system.dictionaries before relying on a dictionary in hot queries.

Common Issues

  • Wrong tuple arity or type for composite keys. dictGet('d', 'a', ('US', 1001)) will not match if the dictionary key is (String, UInt32) and you pass a plain Int. Cast explicitly: ('US', toUInt32(1001)).

  • Expecting dictGet to return all matches. For regexp_tree dictionaries dictGet returns only the deepest (most specific) match; use dictGetAll to get the full array.

  • Cannot read array from text on load. When a non-ClickHouse source stores arrays as text, the source string must be valid ClickHouse array syntax (e.g. [1,2,3]) or be cast via an ALIAS column. See Cannot read array from text.

  • Hierarchy functions returning empty arrays. dictGetHierarchy/dictGetChildren need a HIERARCHICAL attribute defined; without it they have nothing to traverse.

  • Refresh lag on array attributes. Updated arrays only appear after the dictionary reloads. For controlled refresh patterns see the dictionary enrichment update pattern and, for one-off rewrites, backfilling a column with a dictionary lookup.

How Pulse Helps

Array-valued and composite-key dictionaries are easy to get subtly wrong: a dictionary that quietly falls back to defaults, an arrayMap(dictGet(...)) that re-probes a cold dictionary on every row, or a memory-heavy complex_key_hashed that should have been complex_key_hashed_array. Pulse monitors dictionary load state, memory footprint, and refresh failures across your ClickHouse fleet, and surfaces the query patterns where dictionary lookups dominate runtime — so enrichment stays fast and correct as your reference data grows. It is a support and monitoring layer run by engineers who operate ClickHouse in production, not a replacement for your cluster.

Frequently Asked Questions

Q: Can a ClickHouse dictionary attribute be an array?

Yes. Declare the attribute as Array(T) (e.g. Array(Int64), Array(String)) with an optional DEFAULT array. dictGet and dictGetOrDefault return the array directly. This is supported across the common layouts and ClickHouse/PostgreSQL sources natively.

Q: Can I use an array as the dictionary key?

Not directly — dictionary keys are either a single scalar (UInt64) or a fixed composite tuple with a complex_key_* layout. A variable-length Array cannot be the key. If you must key on an array, serialize or hash it into a scalar column first.

Q: How do I enrich every element of an array column?

Wrap the lookup in arrayMap: arrayMap(id -> dictGet('d', 'attr', id), id_array). This returns a parallel array of results per row. Use ARRAY JOIN instead when you need one output row per element for grouping or filtering.

Q: What is the difference between dictGet and dictGetAll?

dictGet returns a single value — for regexp_tree dictionaries this is the deepest (most specific) matching node (which may itself be an array attribute). dictGetAll (v23.5+) returns an Array of all matching values and is meant for regexp_tree dictionaries, where one input can match several patterns.

Q: Which layout should I use for a String composite key with many attributes?

complex_key_hashed_array — it stores attributes in contiguous arrays rather than per-row maps, cutting memory for wide dictionaries. Use plain complex_key_hashed for narrow ones. Pass the key as a tuple in PRIMARY KEY column order.

Q: Why do hierarchy functions like dictGetDescendants return empty arrays?

The dictionary must declare a HIERARCHICAL attribute pointing at each row's parent key. Without it there is no parent/child relationship to traverse, so the functions return empty results.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.