ClickHouse offers two very different tools that both touch "low-variety" data: the LowCardinality column type and dictionaries. They are often confused because both involve mapping values to compact keys, but they solve different problems. LowCardinality is a storage encoding for a single column; dictionaries are a key-value lookup structure you query or join against.
This guide explains what each does, when to reach for one over the other, and the trade-offs in storage, mutability, and query performance. The short version: use LowCardinality to compress a repetitive column in place, and use a dictionary when you need a lookup that can change independently of the rows that reference it.
What LowCardinality Does
LowCardinality is a column modifier that applies dictionary encoding to a single column. Instead of storing the full value in every row, ClickHouse stores each distinct value once in a per-part dictionary and stores a small integer position in each row. This is transparent: the column still behaves like its underlying type for queries, inserts, and functions.
CREATE TABLE events
(
event_time DateTime,
user_id UInt64,
country LowCardinality(String),
event_type LowCardinality(String)
)
ENGINE = MergeTree()
ORDER BY (event_time, user_id);
LowCardinality supports String, FixedString, Date, DateTime, and numeric types except Decimal. It is most effective when the underlying values are repetitive strings — country codes, event types, status names, hostnames.
ClickHouse generally shows higher read/storage efficiency when a column has fewer than ~10,000 distinct values, and may perform worse than the plain type when distinct values exceed ~100,000. The encoding still works above that range (there is no hard limit, and results are often fine up to millions of distinct values), but the benefit shrinks and per-part dictionary overhead grows.
LowCardinality is also the recommended alternative to Enum for string-like categories: an Enum requires an ALTER TABLE every time the value set changes, while LowCardinality accepts new values automatically.
What Dictionaries Do
A ClickHouse dictionary is a separate in-memory (or layered) key-value structure that you load from an external source — another table, a file, a database, an HTTP endpoint — and query with dictGet or join against. The data lives outside your fact table and is refreshed on its own schedule.
CREATE DICTIONARY country_names
(
country_code String,
country_name String,
region String
)
PRIMARY KEY country_code
SOURCE(CLICKHOUSE(TABLE 'country_reference'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 600 MAX 900);
You then enrich rows at query time without storing the looked-up attributes in the fact table:
SELECT
event_type,
dictGet('country_names', 'country_name', country) AS country_name,
count()
FROM events
GROUP BY event_type, country_name;
The key property is independence: the dictionary can be updated (via LIFETIME refresh, SYSTEM RELOAD DICTIONARY, or its source changing) and every subsequent query sees the new mapping — including for rows inserted long ago. See external dictionaries for source and layout options.
Side-by-Side Comparison
| Aspect | LowCardinality | Dictionary |
|---|---|---|
| What it is | Column storage encoding | Separate key-value lookup structure |
| Scope | One column in one table | Shared across queries and tables |
| Stored value | The value itself (encoded) | A key in the row; attributes live in the dictionary |
| Mutability | Historical rows keep their original value | Updating the dictionary changes results for all rows retroactively |
| Enrichment / extra attributes | No — one column, one value | Yes — one key maps to many attributes |
| Memory | Per-part dictionary on disk; loaded as needed | Resident in RAM (for hashed/flat layouts) |
| Best distinct-value range | Up to ~10k ideal, degrades past ~100k | Can hold millions of keys (RAM permitting) |
| Query syntax | Transparent — use the column normally | Explicit — dictGet(...) or JOIN |
| Schema change to add values | None (unlike Enum) |
None (reload the source) |
How to Choose
Use this decision flow:
Do you just want to shrink a repetitive column and keep its value with each row? Use
LowCardinality. Country, status, log level, service name — values that belong to the row and should stay frozen as historical fact. No extra system to manage.Does the value map to other attributes you'd otherwise duplicate in every row? Use a dictionary. Store a compact key (e.g.
product_id) in the fact table and look upname,category,priceviadictGet. This avoids denormalizing wide attributes into billions of rows.Should a change to the reference data affect already-stored rows? Use a dictionary. If a product is recategorized and your reports must reflect the new category for past sales, the mapping must live outside the rows. With
LowCardinality, the old rows keep the old value forever.Is the reference data static and you don't need lookups? Use
LowCardinality. As the Altinity Knowledge Base puts it: if it's a simple repetition you won't need to manipulate or change in the future,LowCardinalityworks great and you avoid the extra complexity of dictionaries.
The two are not mutually exclusive. A common pattern is a LowCardinality(String) key column in the fact table that is also the primary key of a dictionary used for enrichment — compact storage in the rows plus a mutable, shared attribute lookup.
Worked Example: The Same Reference Data, Two Ways
Suppose every event has a payment_method. There are about a dozen distinct values, they rarely change, and you only ever need the value itself.
-- LowCardinality: value stays with the row
ALTER TABLE events
MODIFY COLUMN payment_method LowCardinality(String);
Now suppose instead each event references a merchant_id, and merchants have a name, tier, and country that occasionally change — and reports must use the current merchant attributes.
-- Dictionary: attributes live outside the rows and can change
CREATE DICTIONARY merchants_dict
(
merchant_id UInt64,
name String,
tier String,
country LowCardinality(String)
)
PRIMARY KEY merchant_id
SOURCE(CLICKHOUSE(TABLE 'merchants'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);
SELECT
dictGet('merchants_dict', 'tier', merchant_id) AS tier,
count()
FROM events
GROUP BY tier;
Store merchant_id in the fact table (a plain UInt64), not the merchant name. The dictionary is the source of truth for merchant attributes and can be reloaded when merchants change.
Performance Notes
- GROUP BY on a dictionary key: When a dictionary attribute is marked injective (distinct keys always map to distinct attribute values), ClickHouse can replace the
dictGetin aGROUP BYwith the cheaper key, then look up names only for the final groups. Aggregating by the raw key is usually faster than by the enriched string. - LowCardinality and GROUP BY: Grouping directly on a
LowCardinalitycolumn is fast because ClickHouse can aggregate over the integer positions rather than full strings. - Don't stack LowCardinality on high-cardinality data: Wrapping a near-unique column (like a UUID or a free-text field) in
LowCardinalityhurts both storage and speed. ClickHouse will reject obviously bad cases unlessallow_suspicious_low_cardinality_typesis set — see the suspicious-type error. - Dictionary memory:
HASHED/FLATlayouts hold the whole dictionary in RAM.SPARSE_HASHEDis also fully in-memory but uses less RAM thanHASHEDby trading some CPU performance. For very large key sets where you need to genuinely bound memory, useCACHEorSSD_CACHElayouts.
Best Practices
- Default to
LowCardinalityfor repetitive categorical columns. It is the lower-complexity option and requires no separate object to maintain. Reach for dictionaries only when you need lookups or mutability. - Keep keys, not wide attributes, in fact tables. When attributes are wide or change, store a compact key and enrich via a dictionary at query time.
- Watch the distinct-value count.
LowCardinalityshines below ~10k distinct values and degrades past ~100k. Measure storage before and after withsystem.parts/system.columns. - Prefer
LowCardinality(String)overEnumwhen the value set can grow, to avoidALTER TABLEchurn. - Set a sensible dictionary
LIFETIME. Too short reloads constantly; too long serves stale data. Tune to how often the source actually changes. See the enrichment update pattern. - Define the column type explicitly at
CREATE TABLE. The CREATE TABLE reference coversLowCardinalityand dictionary key columns.
Common Issues
- Expecting a dictionary update to change historical
LowCardinalityvalues. It won't —LowCardinalityvalues are stored in the rows. If you need retroactive changes, you needed a dictionary. - "Creating columns of type LowCardinality(...) is prohibited by default." You wrapped a type/cardinality ClickHouse considers a poor fit. Reconsider the column rather than blindly enabling
allow_suspicious_low_cardinality_types; details in this error page. - Stale dictionary data in reports. The dictionary
LIFETIMEis too long, or the source didn't refresh. Force a reload withSYSTEM RELOAD DICTIONARY merchants_dict. - Memory pressure from
HASHED/FLATdictionaries. A multi-million-key dictionary in RAM can be costly; switch to a cache or SSD layout.
How Pulse Helps
Deciding between LowCardinality and dictionaries is part of a larger schema-design question, and getting it wrong shows up later as bloated storage, slow GROUP BY, or reports that silently serve stale reference data. Pulse monitors ClickHouse schemas and runtime behavior — flagging columns that would benefit from (or are misusing) LowCardinality, surfacing dictionaries that are reloading too often or consuming excess memory, and tracking the storage and query impact of your data-modeling choices over time. It is run by ClickHouse and search-infrastructure experts who can review whether your reference data belongs in columns or dictionaries before it becomes a production problem.
Frequently Asked Questions
Q: Is LowCardinality the same as a dictionary?
No. LowCardinality applies dictionary encoding to a single column's on-disk storage — the values still live with the rows. A ClickHouse dictionary is a separate, shared key-value structure you query with dictGet or join against, and it can change independently of the rows that reference it.
Q: Can I use both together?
Yes, and it's common. Store a compact LowCardinality (or plain integer) key in the fact table and use a dictionary keyed on that column for enrichment. You get compact row storage plus a mutable, shared attribute lookup.
Q: When does updating reference data argue for a dictionary?
Whenever a change to the reference data should affect rows that already exist. Reclassifying a product and wanting past sales to roll up under the new category requires the mapping to live outside the rows — that's a dictionary. If old rows should keep their original value, LowCardinality is correct.
Q: How many distinct values is too many for LowCardinality?
ClickHouse is most efficient below roughly 10,000 distinct values per column and can perform worse than the plain type above ~100,000. It still functions above that (no hard limit), but the benefit shrinks. For very high-cardinality or near-unique columns, don't use LowCardinality.
Q: Should I use LowCardinality or Enum?
Prefer LowCardinality(String) when the value set may grow, because adding values needs no schema change. Enum is reasonable only for a tiny, truly fixed set where the explicit validation is desirable — and even then, changing it requires ALTER TABLE.
Q: Do dictionaries always live in memory?
For FLAT and HASHED layouts, yes — the whole dictionary is resident in RAM. For large key sets, SPARSE_HASHED reduces RAM compared to HASHED but remains fully in-memory. To genuinely bound memory use CACHE or SSD_CACHE layouts, which store only frequently accessed keys, at the cost of some lookup latency. See external dictionaries for the full list of layouts.