Dictionaries vs LowCardinality: Decision Guide and Trade-offs

ClickHouse offers two very different tools that both touch "low-variety" data: the LowCardinality column type and dictionaries. They are often confused because both involve mapping values to compact keys, but they solve different problems. LowCardinality is a storage encoding for a single column; dictionaries are a key-value lookup structure you query or join against.

This guide explains what each does, when to reach for one over the other, and the trade-offs in storage, mutability, and query performance. The short version: use LowCardinality to compress a repetitive column in place, and use a dictionary when you need a lookup that can change independently of the rows that reference it.

What LowCardinality Does

LowCardinality is a column modifier that applies dictionary encoding to a single column. Instead of storing the full value in every row, ClickHouse stores each distinct value once in a per-part dictionary and stores a small integer position in each row. This is transparent: the column still behaves like its underlying type for queries, inserts, and functions.

CREATE TABLE events
(
    event_time DateTime,
    user_id    UInt64,
    country    LowCardinality(String),
    event_type LowCardinality(String)
)
ENGINE = MergeTree()
ORDER BY (event_time, user_id);

LowCardinality supports String, FixedString, Date, DateTime, and numeric types except Decimal. It is most effective when the underlying values are repetitive strings — country codes, event types, status names, hostnames.

ClickHouse generally shows higher read/storage efficiency when a column has fewer than ~10,000 distinct values, and may perform worse than the plain type when distinct values exceed ~100,000. The encoding still works above that range (there is no hard limit, and results are often fine up to millions of distinct values), but the benefit shrinks and per-part dictionary overhead grows.

LowCardinality is also the recommended alternative to Enum for string-like categories: an Enum requires an ALTER TABLE every time the value set changes, while LowCardinality accepts new values automatically.

What Dictionaries Do

A ClickHouse dictionary is a separate in-memory (or layered) key-value structure that you load from an external source — another table, a file, a database, an HTTP endpoint — and query with dictGet or join against. The data lives outside your fact table and is refreshed on its own schedule.

CREATE DICTIONARY country_names
(
    country_code String,
    country_name String,
    region       String
)
PRIMARY KEY country_code
SOURCE(CLICKHOUSE(TABLE 'country_reference'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 600 MAX 900);

You then enrich rows at query time without storing the looked-up attributes in the fact table:

SELECT
    event_type,
    dictGet('country_names', 'country_name', country) AS country_name,
    count()
FROM events
GROUP BY event_type, country_name;

The key property is independence: the dictionary can be updated (via LIFETIME refresh, SYSTEM RELOAD DICTIONARY, or its source changing) and every subsequent query sees the new mapping — including for rows inserted long ago. See external dictionaries for source and layout options.

Side-by-Side Comparison

Aspect LowCardinality Dictionary
What it is Column storage encoding Separate key-value lookup structure
Scope One column in one table Shared across queries and tables
Stored value The value itself (encoded) A key in the row; attributes live in the dictionary
Mutability Historical rows keep their original value Updating the dictionary changes results for all rows retroactively
Enrichment / extra attributes No — one column, one value Yes — one key maps to many attributes
Memory Per-part dictionary on disk; loaded as needed Resident in RAM (for hashed/flat layouts)
Best distinct-value range Up to ~10k ideal, degrades past ~100k Can hold millions of keys (RAM permitting)
Query syntax Transparent — use the column normally Explicit — dictGet(...) or JOIN
Schema change to add values None (unlike Enum) None (reload the source)

How to Choose

Use this decision flow:

  1. Do you just want to shrink a repetitive column and keep its value with each row? Use LowCardinality. Country, status, log level, service name — values that belong to the row and should stay frozen as historical fact. No extra system to manage.

  2. Does the value map to other attributes you'd otherwise duplicate in every row? Use a dictionary. Store a compact key (e.g. product_id) in the fact table and look up name, category, price via dictGet. This avoids denormalizing wide attributes into billions of rows.

  3. Should a change to the reference data affect already-stored rows? Use a dictionary. If a product is recategorized and your reports must reflect the new category for past sales, the mapping must live outside the rows. With LowCardinality, the old rows keep the old value forever.

  4. Is the reference data static and you don't need lookups? Use LowCardinality. As the Altinity Knowledge Base puts it: if it's a simple repetition you won't need to manipulate or change in the future, LowCardinality works great and you avoid the extra complexity of dictionaries.

The two are not mutually exclusive. A common pattern is a LowCardinality(String) key column in the fact table that is also the primary key of a dictionary used for enrichment — compact storage in the rows plus a mutable, shared attribute lookup.

Worked Example: The Same Reference Data, Two Ways

Suppose every event has a payment_method. There are about a dozen distinct values, they rarely change, and you only ever need the value itself.

-- LowCardinality: value stays with the row
ALTER TABLE events
    MODIFY COLUMN payment_method LowCardinality(String);

Now suppose instead each event references a merchant_id, and merchants have a name, tier, and country that occasionally change — and reports must use the current merchant attributes.

-- Dictionary: attributes live outside the rows and can change
CREATE DICTIONARY merchants_dict
(
    merchant_id UInt64,
    name        String,
    tier        String,
    country     LowCardinality(String)
)
PRIMARY KEY merchant_id
SOURCE(CLICKHOUSE(TABLE 'merchants'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);

SELECT
    dictGet('merchants_dict', 'tier', merchant_id) AS tier,
    count()
FROM events
GROUP BY tier;

Store merchant_id in the fact table (a plain UInt64), not the merchant name. The dictionary is the source of truth for merchant attributes and can be reloaded when merchants change.

Performance Notes

  • GROUP BY on a dictionary key: When a dictionary attribute is marked injective (distinct keys always map to distinct attribute values), ClickHouse can replace the dictGet in a GROUP BY with the cheaper key, then look up names only for the final groups. Aggregating by the raw key is usually faster than by the enriched string.
  • LowCardinality and GROUP BY: Grouping directly on a LowCardinality column is fast because ClickHouse can aggregate over the integer positions rather than full strings.
  • Don't stack LowCardinality on high-cardinality data: Wrapping a near-unique column (like a UUID or a free-text field) in LowCardinality hurts both storage and speed. ClickHouse will reject obviously bad cases unless allow_suspicious_low_cardinality_types is set — see the suspicious-type error.
  • Dictionary memory: HASHED/FLAT layouts hold the whole dictionary in RAM. SPARSE_HASHED is also fully in-memory but uses less RAM than HASHED by trading some CPU performance. For very large key sets where you need to genuinely bound memory, use CACHE or SSD_CACHE layouts.

Best Practices

  1. Default to LowCardinality for repetitive categorical columns. It is the lower-complexity option and requires no separate object to maintain. Reach for dictionaries only when you need lookups or mutability.
  2. Keep keys, not wide attributes, in fact tables. When attributes are wide or change, store a compact key and enrich via a dictionary at query time.
  3. Watch the distinct-value count. LowCardinality shines below ~10k distinct values and degrades past ~100k. Measure storage before and after with system.parts / system.columns.
  4. Prefer LowCardinality(String) over Enum when the value set can grow, to avoid ALTER TABLE churn.
  5. Set a sensible dictionary LIFETIME. Too short reloads constantly; too long serves stale data. Tune to how often the source actually changes. See the enrichment update pattern.
  6. Define the column type explicitly at CREATE TABLE. The CREATE TABLE reference covers LowCardinality and dictionary key columns.

Common Issues

  • Expecting a dictionary update to change historical LowCardinality values. It won't — LowCardinality values are stored in the rows. If you need retroactive changes, you needed a dictionary.
  • "Creating columns of type LowCardinality(...) is prohibited by default." You wrapped a type/cardinality ClickHouse considers a poor fit. Reconsider the column rather than blindly enabling allow_suspicious_low_cardinality_types; details in this error page.
  • Stale dictionary data in reports. The dictionary LIFETIME is too long, or the source didn't refresh. Force a reload with SYSTEM RELOAD DICTIONARY merchants_dict.
  • Memory pressure from HASHED/FLAT dictionaries. A multi-million-key dictionary in RAM can be costly; switch to a cache or SSD layout.

How Pulse Helps

Deciding between LowCardinality and dictionaries is part of a larger schema-design question, and getting it wrong shows up later as bloated storage, slow GROUP BY, or reports that silently serve stale reference data. Pulse monitors ClickHouse schemas and runtime behavior — flagging columns that would benefit from (or are misusing) LowCardinality, surfacing dictionaries that are reloading too often or consuming excess memory, and tracking the storage and query impact of your data-modeling choices over time. It is run by ClickHouse and search-infrastructure experts who can review whether your reference data belongs in columns or dictionaries before it becomes a production problem.

Frequently Asked Questions

Q: Is LowCardinality the same as a dictionary?

No. LowCardinality applies dictionary encoding to a single column's on-disk storage — the values still live with the rows. A ClickHouse dictionary is a separate, shared key-value structure you query with dictGet or join against, and it can change independently of the rows that reference it.

Q: Can I use both together?

Yes, and it's common. Store a compact LowCardinality (or plain integer) key in the fact table and use a dictionary keyed on that column for enrichment. You get compact row storage plus a mutable, shared attribute lookup.

Q: When does updating reference data argue for a dictionary?

Whenever a change to the reference data should affect rows that already exist. Reclassifying a product and wanting past sales to roll up under the new category requires the mapping to live outside the rows — that's a dictionary. If old rows should keep their original value, LowCardinality is correct.

Q: How many distinct values is too many for LowCardinality?

ClickHouse is most efficient below roughly 10,000 distinct values per column and can perform worse than the plain type above ~100,000. It still functions above that (no hard limit), but the benefit shrinks. For very high-cardinality or near-unique columns, don't use LowCardinality.

Q: Should I use LowCardinality or Enum?

Prefer LowCardinality(String) when the value set may grow, because adding values needs no schema change. Enum is reasonable only for a tiny, truly fixed set where the explicit validation is desirable — and even then, changing it requires ALTER TABLE.

Q: Do dictionaries always live in memory?

For FLAT and HASHED layouts, yes — the whole dictionary is resident in RAM. For large key sets, SPARSE_HASHED reduces RAM compared to HASHED but remains fully in-memory. To genuinely bound memory use CACHE or SSD_CACHE layouts, which store only frequently accessed keys, at the cost of some lookup latency. See external dictionaries for the full list of layouts.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.