"Values mapping" covers a family of related problems: translating a status code to a label, resolving a user ID to a name, or enriching an events table with attributes from a dimension table. ClickHouse offers several tools for this — transform, conditional functions, Enum casts, dictionaries (dictGet), LEFT JOIN, and subqueries — and picking the wrong one is a common source of slow queries.
This guide walks through each approach, when it fits, and how they compare on performance and maintenance. The short version: use inline functions for small, static maps; use dictionaries when reference data lives in another table and changes over time; reach for JOIN only when the lookup data is too large to hold in memory.
Inline Mapping: transform, multiIf, and Enum
When the set of mappings is small and known at query-authoring time, map values inline without touching another table.
transform
transform is the most direct tool for value-to-value mapping. It takes a value, an array of source values, an array of target values, and an optional default:
-- transform(x, array_from, array_to, default)
SELECT transform(status, [200, 404, 500], ['ok', 'not_found', 'error'], 'other') AS label
FROM requests;
Both arrays must be constants of equal length. array_from elements must be the same type as x, and array_to elements (and the default) share a common type that becomes the result type.
There are two forms with different unmatched behavior:
- With a default —
transform(x, from, to, default)returnsdefaultfor any value not found inarray_from. - Without a default —
transform(x, from, to)returnsxunchanged when there is no match. This form requiresarray_toto have the same type asx, since the original value may pass through.
-- No default: unmatched countries pass through unchanged
SELECT transform(country, ['US', 'GB'], ['United States', 'United Kingdom']) AS name
FROM events;
One caveat: transform is not designed around NULL handling — a NULL input is treated as unmatched, so it falls through to the default (or passes through unchanged in the no-default form) rather than being mapped. Filter or coalesce nulls before mapping (for example with ifNull or coalesce) when you need predictable results.
multiIf and CASE
multiIf evaluates (condition, result) pairs in order and returns the first match, with a required final default. The search form CASE WHEN condition THEN result compiles directly into multiIf, so the two are equivalent. (The match form CASE expr WHEN val THEN result compiles into transform() instead — see the ClickHouse operators reference.)
SELECT multiIf(
status = 200, 'ok',
status = 404, 'not_found',
status = 500, 'error',
'other'
) AS label
FROM requests;
For pure equality-based value mapping, transform is generally faster than multiIf because it does an array lookup rather than evaluating a chain of conditions. Use multiIf/CASE when branches involve ranges or compound conditions (status >= 500), not just equality. See the multiIf function reference and CASE expressions in ClickHouse for details.
Enum cast
If the mapping is between integer codes and a small, stable set of string labels, casting to an Enum is the fastest option of all — the mapping is encoded in the column type itself:
SELECT CAST(status_code AS Enum('ok' = 200, 'not_found' = 404, 'error' = 500)) AS label
FROM requests;
The trade-off is rigidity: every possible value must be listed in the Enum definition, and an unlisted value raises an error rather than falling back to a default. Use it only when the value set is closed and rarely changes.
Inline mapping comparison
| Approach | Best for | Unmatched value | Relative speed |
|---|---|---|---|
transform(x, from, to, default) |
Equality value-to-value maps | Returns default |
Fast |
transform(x, from, to) |
Same as above, pass-through | Returns x unchanged |
Fast |
multiIf / CASE |
Range or compound conditions | Returns required else |
Slower than transform |
Enum cast |
Closed, stable code↔label sets | Raises an error | Fastest |
Mapping From Another Table
Once the mapping lives in a separate table — user attributes, product catalogs, geo lookups — inline functions no longer fit. The three options are dictionaries, joins, and subqueries.
Dictionaries (dictGet)
A dictionary is an in-memory key-value structure built for exactly this pattern: O(1) lookups with no table scan. Define it once over the source table (or any external source), then call dictGet per row:
CREATE DICTIONARY users_dict
(
user_id UInt64,
name String,
country String
)
PRIMARY KEY user_id
SOURCE(CLICKHOUSE(TABLE 'users'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);
-- Enrich events with a single lookup
SELECT
event_time,
dictGet('users_dict', 'name', user_id) AS user_name
FROM events;
dictGet('dict_name', attr, key) returns the configured null_value (the per-attribute default) when the key is missing. Related functions:
dictGetOrDefault('dict', attr, key, default)— returns yourdefaultinstead of the configured one when the key is absent.dictGetOrNull('dict', attr, key)— returnsNULLfor a missing key.dictHas('dict', key)— returns1if the key exists, else0.
You can fetch several attributes in one call with a tuple of attribute names, which avoids repeating the lookup:
SELECT
event_time,
dictGet('users_dict', ('name', 'country'), user_id) AS (user_name, user_country)
FROM events;
Dictionaries refresh on their LIFETIME schedule, so reference data stays current without rewriting queries. For data that is generated by a script rather than stored in a table, see executable dictionaries.
LEFT JOIN
A LEFT JOIN against the dimension table produces the same enrichment with familiar SQL:
SELECT
e.event_time,
u.name AS user_name
FROM events AS e
LEFT JOIN users AS u ON e.user_id = u.user_id;
The catch is memory: ClickHouse's default hash join builds the right-hand table in RAM, so a large dimension table can be expensive or fail outright. For the same in-memory key-value workload, a dictionary is usually faster and lighter than a join, because the dictionary is purpose-built and already resident in memory. Reach for a join when the lookup data is too large to keep in a dictionary, or when you need join semantics (multiple matching rows, range conditions) that a dictionary can't express. See ClickHouse JOIN performance and JOIN types for tuning and algorithm choices.
Subquery with IN or a joined derived table
A subquery is occasionally the right tool — for filtering by a set of keys, or when the mapping is computed on the fly:
-- Filtering, not enrichment
SELECT *
FROM events
WHERE user_id IN (SELECT user_id FROM users WHERE country = 'US');
For enrichment (adding columns), a subquery still resolves to a join under the hood, so it carries the same memory characteristics as a LEFT JOIN without the advantages of a dictionary. Prefer IN-style subqueries for filtering and dictionaries for column enrichment.
Choosing an Approach
| Method | Reference data | Lives where | Memory | Stays fresh | Use when |
|---|---|---|---|---|---|
transform / multiIf / Enum |
Small, static | In the query | None | Manual edit | A handful of hard-coded mappings |
Dictionary (dictGet) |
Small–medium, changing | Separate table / source | Fits in RAM | Auto via LIFETIME |
Enriching rows with attributes from a lookup table |
LEFT JOIN |
Large | Separate table | Right side in RAM | Always (live read) | Lookup table too big for a dictionary, or join semantics needed |
Subquery (IN) |
Any | Separate table | Depends | Always | Filtering by a set of keys, not adding columns |
Rule of thumb: static map → inline function; changing lookup that fits in memory → dictionary; lookup too large for memory → join.
Best Practices
- Default to dictionaries for cross-table enrichment. They avoid the per-query memory cost of a join and keep reference data fresh automatically.
- Use
transformoverCASE/multiIffor equality maps. It does an array lookup instead of a condition chain and is measurably faster for value-to-value translation. - Pick the smallest key and attribute types. Dictionary memory scales with column width; narrow
UInt/LowCardinalitytypes keep it lean. - Fetch multiple attributes in one
dictGetcall using a tuple of names rather than several separate calls over the same key. - Handle NULLs explicitly.
transformtreatsNULLas unmatched rather than mapping it; wrap inputs inifNull/coalesceif nulls are possible. - Match the
Enumto a truly closed set. An unlisted value raises an error, so only use it when the value domain is stable.
Common Issues
- Stale dictionary data. A dictionary only refreshes on its
LIFETIMEschedule (orSYSTEM RELOAD DICTIONARY). If lookups return old values, check theLIFETIMEand the source's update query. See dictionary lazy load behavior. - JOIN runs out of memory. A
LEFT JOINagainst a large dimension table builds it in RAM. Convert the lookup into a dictionary, or switch the join algorithm — see JOIN algorithm settings. transformreturns the wrong type. In the no-default form,array_tomust share the type ofxbecause unmatched values pass through unchanged; mismatched types cause errors or unexpected casts.dictGetreturns a default you didn't expect. A missing key returns the attribute's configurednull_value, not an error. UsedictHasordictGetOrNullto distinguish "missing" from "mapped to the default".
How Pulse Helps
Choosing between a dictionary, a join, and an inline map is rarely obvious from the query alone — it depends on the size and update cadence of your reference data and how the optimizer executes the plan. Pulse monitors ClickHouse query patterns and surfaces enrichment joins that would run faster and use less memory as dictionaries, flags dictionaries that are stale or oversized for their layout, and helps size dictionary memory before it becomes a problem in production. It is built and operated by the team behind BigData Boutique, who run ClickHouse for data-intensive workloads.
Frequently Asked Questions
Q: Is transform faster than CASE/multiIf?
For equality-based value mapping, yes. transform performs an array lookup, while multiIf evaluates a chain of conditions. The search form CASE WHEN condition compiles to multiIf; the match form CASE expr WHEN val compiles to transform(). Use multiIf/CASE WHEN only when branches need ranges or compound conditions rather than simple equality.
Q: When should I use a dictionary instead of a LEFT JOIN?
Use a dictionary when the lookup table fits in memory and you are enriching rows with attributes. The dictionary is purpose-built for in-memory key-value lookups and avoids rebuilding the right side of a join per query. Use a LEFT JOIN when the lookup data is too large for a dictionary, or when you need join semantics like multiple matching rows.
Q: What does dictGet return when the key isn't found?
It returns the attribute's configured null_value (its per-attribute default). Use dictGetOrDefault to supply your own fallback, dictGetOrNull to get NULL, or dictHas to test for existence first.
Q: How does transform behave for values not in the mapping?
With a default argument, transform(x, from, to, default) returns the default. Without it, transform(x, from, to) returns x unchanged — which is why that form requires array_to to share the type of x.
Q: Does transform handle NULL values?
Not reliably. transform treats a NULL input as an unmatched value, so it falls through to the default (or passes through unchanged) instead of being mapped. Convert nulls first with ifNull or coalesce before passing them in.
Q: Can I map several columns at once from a dictionary?
Yes. Pass a tuple of attribute names to dictGet, e.g. dictGet('users_dict', ('name', 'country'), user_id), and assign the result to a tuple of aliases. This performs a single keyed lookup instead of one per attribute.