ClickHouse Values Mapping: Dictionaries vs Joins vs Subqueries

Q: Is `transform` faster than `CASE`/`multiIf`?

For equality-based value mapping, yes. transform performs an array lookup, while multiIf evaluates a chain of conditions. The search form CASE WHEN condition compiles to multiIf ; the match form CASE expr WHEN val compiles to transform() . Use multiIf / CASE WHEN only when branches need ranges or compound conditions rather than simple equality.

Q: What does `dictGet` return when the key isn't found?

It returns the attribute's configured null_value (its per-attribute default). Use dictGetOrDefault to supply your own fallback, dictGetOrNull to get NULL , or dictHas to test for existence first.

Q: How does `transform` behave for values not in the mapping?

With a default argument, transform(x, from, to, default) returns the default . Without it, transform(x, from, to) returns x unchanged — which is why that form requires array_to to share the type of x .

Q: Does `transform` handle NULL values?

Not reliably. transform treats a NULL input as an unmatched value, so it falls through to the default (or passes through unchanged) instead of being mapped. Convert nulls first with ifNull or coalesce before passing them in.

Q: Can I map several columns at once from a dictionary?

Yes. Pass a tuple of attribute names to dictGet , e.g. dictGet('users_dict', ('name', 'country'), user_id) , and assign the result to a tuple of aliases. This performs a single keyed lookup instead of one per attribute.

"Values mapping" covers a family of related problems: translating a status code to a label, resolving a user ID to a name, or enriching an events table with attributes from a dimension table. ClickHouse offers several tools for this — transform, conditional functions, Enum casts, dictionaries (dictGet), LEFT JOIN, and subqueries — and picking the wrong one is a common source of slow queries.

This guide walks through each approach, when it fits, and how they compare on performance and maintenance. The short version: use inline functions for small, static maps; use dictionaries when reference data lives in another table and changes over time; reach for JOIN only when the lookup data is too large to hold in memory.

Inline Mapping: `transform`, `multiIf`, and `Enum`

When the set of mappings is small and known at query-authoring time, map values inline without touching another table.

`transform`

transform is the most direct tool for value-to-value mapping. It takes a value, an array of source values, an array of target values, and an optional default:

-- transform(x, array_from, array_to, default)
SELECT transform(status, [200, 404, 500], ['ok', 'not_found', 'error'], 'other') AS label
FROM requests;

Both arrays must be constants of equal length. array_from elements must be the same type as x, and array_to elements (and the default) share a common type that becomes the result type.

There are two forms with different unmatched behavior:

With a default — transform(x, from, to, default) returns default for any value not found in array_from.
Without a default — transform(x, from, to) returns x unchanged when there is no match. This form requires array_to to have the same type as x, since the original value may pass through.

-- No default: unmatched countries pass through unchanged
SELECT transform(country, ['US', 'GB'], ['United States', 'United Kingdom']) AS name
FROM events;

One caveat: transform is not designed around NULL handling — a NULL input is treated as unmatched, so it falls through to the default (or passes through unchanged in the no-default form) rather than being mapped. Filter or coalesce nulls before mapping (for example with ifNull or coalesce) when you need predictable results.

`multiIf` and `CASE`

multiIf evaluates (condition, result) pairs in order and returns the first match, with a required final default. The search form CASE WHEN condition THEN result compiles directly into multiIf, so the two are equivalent. (The match form CASE expr WHEN val THEN result compiles into transform() instead — see the ClickHouse operators reference.)

SELECT multiIf(
    status = 200, 'ok',
    status = 404, 'not_found',
    status = 500, 'error',
    'other'
) AS label
FROM requests;

For pure equality-based value mapping, transform is generally faster than multiIf because it does an array lookup rather than evaluating a chain of conditions. Use multiIf/CASE when branches involve ranges or compound conditions (status >= 500), not just equality. See the multiIf function reference and CASE expressions in ClickHouse for details.

`Enum` cast

If the mapping is between integer codes and a small, stable set of string labels, casting to an Enum is the fastest option of all — the mapping is encoded in the column type itself:

SELECT CAST(status_code AS Enum('ok' = 200, 'not_found' = 404, 'error' = 500)) AS label
FROM requests;

The trade-off is rigidity: every possible value must be listed in the Enum definition, and an unlisted value raises an error rather than falling back to a default. Use it only when the value set is closed and rarely changes.

Inline mapping comparison

Approach	Best for	Unmatched value	Relative speed
`transform(x, from, to, default)`	Equality value-to-value maps	Returns `default`	Fast
`transform(x, from, to)`	Same as above, pass-through	Returns `x` unchanged	Fast
`multiIf` / `CASE`	Range or compound conditions	Returns required `else`	Slower than `transform`
`Enum` cast	Closed, stable code↔label sets	Raises an error	Fastest

Mapping From Another Table

Once the mapping lives in a separate table — user attributes, product catalogs, geo lookups — inline functions no longer fit. The three options are dictionaries, joins, and subqueries.

Dictionaries (`dictGet`)

A dictionary is an in-memory key-value structure built for exactly this pattern: O(1) lookups with no table scan. Define it once over the source table (or any external source), then call dictGet per row:

CREATE DICTIONARY users_dict
(
    user_id UInt64,
    name    String,
    country String
)
PRIMARY KEY user_id
SOURCE(CLICKHOUSE(TABLE 'users'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);

-- Enrich events with a single lookup
SELECT
    event_time,
    dictGet('users_dict', 'name', user_id) AS user_name
FROM events;

dictGet('dict_name', attr, key) returns the configured null_value (the per-attribute default) when the key is missing. Related functions:

dictGetOrDefault('dict', attr, key, default) — returns your default instead of the configured one when the key is absent.
dictGetOrNull('dict', attr, key) — returns NULL for a missing key.
dictHas('dict', key) — returns 1 if the key exists, else 0.

You can fetch several attributes in one call with a tuple of attribute names, which avoids repeating the lookup:

SELECT
    event_time,
    dictGet('users_dict', ('name', 'country'), user_id) AS (user_name, user_country)
FROM events;

Dictionaries refresh on their LIFETIME schedule, so reference data stays current without rewriting queries. For data that is generated by a script rather than stored in a table, see executable dictionaries.

`LEFT JOIN`

A LEFT JOIN against the dimension table produces the same enrichment with familiar SQL:

SELECT
    e.event_time,
    u.name AS user_name
FROM events AS e
LEFT JOIN users AS u ON e.user_id = u.user_id;

The catch is memory: ClickHouse's default hash join builds the right-hand table in RAM, so a large dimension table can be expensive or fail outright. For the same in-memory key-value workload, a dictionary is usually faster and lighter than a join, because the dictionary is purpose-built and already resident in memory. Reach for a join when the lookup data is too large to keep in a dictionary, or when you need join semantics (multiple matching rows, range conditions) that a dictionary can't express. See ClickHouse JOIN performance and JOIN types for tuning and algorithm choices.

Subquery with `IN` or a joined derived table

A subquery is occasionally the right tool — for filtering by a set of keys, or when the mapping is computed on the fly:

-- Filtering, not enrichment
SELECT *
FROM events
WHERE user_id IN (SELECT user_id FROM users WHERE country = 'US');

For enrichment (adding columns), a subquery still resolves to a join under the hood, so it carries the same memory characteristics as a LEFT JOIN without the advantages of a dictionary. Prefer IN-style subqueries for filtering and dictionaries for column enrichment.

Choosing an Approach

Method	Reference data	Lives where	Memory	Stays fresh	Use when
`transform` / `multiIf` / `Enum`	Small, static	In the query	None	Manual edit	A handful of hard-coded mappings
Dictionary (`dictGet`)	Small–medium, changing	Separate table / source	Fits in RAM	Auto via `LIFETIME`	Enriching rows with attributes from a lookup table
`LEFT JOIN`	Large	Separate table	Right side in RAM	Always (live read)	Lookup table too big for a dictionary, or join semantics needed
Subquery (`IN`)	Any	Separate table	Depends	Always	Filtering by a set of keys, not adding columns

Rule of thumb: static map → inline function; changing lookup that fits in memory → dictionary; lookup too large for memory → join.

Best Practices

Default to dictionaries for cross-table enrichment. They avoid the per-query memory cost of a join and keep reference data fresh automatically.
Use transform over CASE/multiIf for equality maps. It does an array lookup instead of a condition chain and is measurably faster for value-to-value translation.
Pick the smallest key and attribute types. Dictionary memory scales with column width; narrow UInt/LowCardinality types keep it lean.
Fetch multiple attributes in one dictGet call using a tuple of names rather than several separate calls over the same key.
Handle NULLs explicitly. transform treats NULL as unmatched rather than mapping it; wrap inputs in ifNull/coalesce if nulls are possible.
Match the Enum to a truly closed set. An unlisted value raises an error, so only use it when the value domain is stable.

Common Issues

Stale dictionary data. A dictionary only refreshes on its LIFETIME schedule (or SYSTEM RELOAD DICTIONARY). If lookups return old values, check the LIFETIME and the source's update query. See dictionary lazy load behavior.
JOIN runs out of memory. A LEFT JOIN against a large dimension table builds it in RAM. Convert the lookup into a dictionary, or switch the join algorithm — see JOIN algorithm settings.
transform returns the wrong type. In the no-default form, array_to must share the type of x because unmatched values pass through unchanged; mismatched types cause errors or unexpected casts.
dictGet returns a default you didn't expect. A missing key returns the attribute's configured null_value, not an error. Use dictHas or dictGetOrNull to distinguish "missing" from "mapped to the default".

How Pulse Helps

Choosing between a dictionary, a join, and an inline map is rarely obvious from the query alone — it depends on the size and update cadence of your reference data and how the optimizer executes the plan. Pulse monitors ClickHouse query patterns and surfaces enrichment joins that would run faster and use less memory as dictionaries, flags dictionaries that are stale or oversized for their layout, and helps size dictionary memory before it becomes a problem in production. It is built and operated by the team behind BigData Boutique, who run ClickHouse for data-intensive workloads.

Frequently Asked Questions

Q: Is transform faster than CASE/multiIf?

For equality-based value mapping, yes. transform performs an array lookup, while multiIf evaluates a chain of conditions. The search form CASE WHEN condition compiles to multiIf; the match form CASE expr WHEN val compiles to transform(). Use multiIf/CASE WHEN only when branches need ranges or compound conditions rather than simple equality.

Q: When should I use a dictionary instead of a LEFT JOIN?

Use a dictionary when the lookup table fits in memory and you are enriching rows with attributes. The dictionary is purpose-built for in-memory key-value lookups and avoids rebuilding the right side of a join per query. Use a LEFT JOIN when the lookup data is too large for a dictionary, or when you need join semantics like multiple matching rows.

Q: What does dictGet return when the key isn't found?

It returns the attribute's configured null_value (its per-attribute default). Use dictGetOrDefault to supply your own fallback, dictGetOrNull to get NULL, or dictHas to test for existence first.

Q: How does transform behave for values not in the mapping?

With a default argument, transform(x, from, to, default) returns the default. Without it, transform(x, from, to) returns x unchanged — which is why that form requires array_to to share the type of x.

Q: Does transform handle NULL values?

Not reliably. transform treats a NULL input as an unmatched value, so it falls through to the default (or passes through unchanged) instead of being mapped. Convert nulls first with ifNull or coalesce before passing them in.

Q: Can I map several columns at once from a dictionary?

Yes. Pass a tuple of attribute names to dictGet, e.g. dictGet('users_dict', ('name', 'country'), user_id), and assign the result to a tuple of aliases. This performs a single keyed lookup instead of one per attribute.

ClickHouse Values Mapping: Dictionaries vs Joins vs Subqueries

Inline Mapping: transform, multiIf, and Enum

transform

multiIf and CASE

Enum cast