LowCardinality changes the internal representation of a column to dictionary encoding: instead of storing each value inline, ClickHouse stores the distinct values once in a dictionary and replaces every row with a small integer reference. For columns with a limited number of distinct values, this cuts storage and accelerates filters and GROUP BY operations — often dramatically.
The catch is the word low. Applied to the right columns, LowCardinality is one of the highest-leverage schema changes you can make. Applied to a high-cardinality column, it adds overhead and can perform worse than a plain type. This guide covers how to identify good candidates, the tradeoffs involved, and how LowCardinality compares to Enum.
How LowCardinality Works
A LowCardinality(T) column is stored as two structures: a dictionary holding each distinct value of the underlying type T exactly once, and an index array of integer positions — one per row — pointing into the dictionary.
CREATE TABLE events
(
id UInt64,
event_type LowCardinality(String),
region LowCardinality(String),
ts DateTime
)
ENGINE = MergeTree
ORDER BY (event_type, ts);
Dictionaries are built per data part. ClickHouse can keep a single common dictionary per part when the number of distinct values is small (up to 8,192 distinct values by default), which unlocks extra optimizations — most queries operate on the integer positions and defer materializing the actual strings until late stages, and hashes for GROUP BY can be precomputed against the dictionary rather than recomputed per row.
LowCardinality is a wrapper, not a standalone type. It can wrap String, FixedString, Date, DateTime, and numeric types (Decimal is not supported). In practice it is most valuable on String columns. For the exact syntax accepted by CREATE TABLE, see the CREATE TABLE reference; for type-syntax errors, see Illegal syntax for data type.
When LowCardinality Helps
The efficiency of LowCardinality depends almost entirely on the number of distinct values. The official guidance:
- Below ~10,000 distinct values: ClickHouse usually shows higher efficiency for both reading and storing. This is the sweet spot.
- Above ~100,000 distinct values:
LowCardinalitycan perform worse than the plain underlying type, because the dictionary itself grows large and the position index widens.
The effect is most pronounced when the underlying strings are long and repetitive. Converting a long, path-like or descriptive string column with few distinct values can yield large gains — Altinity's benchmark on a 172M-row dataset showed queries running roughly 1.5–3.5x faster and storage shrinking around 2.5x compressed (and ~10x uncompressed) after converting a city-name column to LowCardinality(String). Short strings (a few characters) with high cardinality see little to no benefit.
Good candidates share a profile:
- A bounded set of repeating values: status codes, country/region names, event types, HTTP methods, log levels, service names, currency codes.
- Values that recur across millions of rows.
- Columns frequently used in
WHERE,GROUP BY, orORDER BY.
Poor candidates: high-cardinality identifiers (UUIDs, user IDs, request IDs, URLs with unbounded variety), free-text fields, and already-tiny columns like a single-byte flag.
Cardinality Analysis: Measuring Before You Decide
Don't guess. Measure the distinct-value count of each candidate column against your real data before converting.
SELECT
uniqExact(event_type) AS event_type_card,
uniqExact(region) AS region_card,
uniqExact(user_id) AS user_id_card,
count() AS rows
FROM events;
For a quick scan across many columns, uniq() (approximate) is cheaper than uniqExact() on large tables:
SELECT
uniq(url) AS url_card,
uniq(referrer) AS referrer_card,
uniq(country) AS country_card
FROM web_logs;
Cardinality can also vary by part or partition. Because dictionaries are built per part, what matters is the distinct count within a part, not just globally. A column that is low-cardinality overall but high-cardinality within each daily partition still benefits; the reverse is rarely an issue in practice. After conversion, you can compare actual on-disk size:
SELECT
name,
formatReadableSize(sum(data_compressed_bytes)) AS compressed,
formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed
FROM system.columns
WHERE table = 'events' AND database = currentDatabase()
GROUP BY name
ORDER BY sum(data_compressed_bytes) DESC;
Memory and CPU Tradeoffs
LowCardinality is not free. Understand the costs before applying it broadly:
- Insert-time CPU: Every insert must build or extend the per-part dictionary and encode positions. On high-cardinality columns this work is wasted because the dictionary nearly equals the row count.
- Memory during queries: Dictionaries are loaded into memory. Many
LowCardinalitycolumns, each with large dictionaries, increase memory pressure — particularly during distributedGROUP BYand merges. - Marginal benefit on tiny types: Wrapping a
UInt8or a shortFixedStringinLowCardinalityoften adds dictionary overhead without meaningful savings.
Two settings govern dictionary behavior:
low_cardinality_max_dictionary_size(default8192) — the maximum dictionary size for which ClickHouse keeps a single shared dictionary per part. Beyond this, additional per-block dictionaries are used, which reduces the optimization benefit.low_cardinality_use_single_dictionary_for_part(default0) — controls whether a single dictionary is enforced for the whole part.
To prevent accidental misuse on inappropriately small or risky types, ClickHouse has allow_suspicious_low_cardinality_types (default 0), which blocks LowCardinality over fixed-width numeric types of 8 bytes or less and FixedString of 8 bytes or less. See the dedicated page on allow_suspicious_low_cardinality_types and the related suspicious type for LowCardinality error for how to handle that guardrail.
LowCardinality vs. Enum
Both Enum and LowCardinality are efficient ways to store a small set of repeating string values, and both store an integer per row internally. They differ in flexibility and operational cost.
| Aspect | LowCardinality(String) | Enum8 / Enum16 |
|---|---|---|
| Value set | Open — any new value is added automatically | Fixed — defined in the schema |
| Adding a value | No action needed | Requires ALTER TABLE ... MODIFY COLUMN |
| Unknown value on insert | Accepted | Throws an exception |
| Max distinct values | Practically large (efficient below ~10k) | 256 (Enum8) / 65,536 (Enum16) |
| Storage | Dictionary + positions, per part | 1 or 2 bytes per row, mapping in schema |
| Best for | Evolving / unknown value sets | Truly static, known-in-advance sets |
The official recommendation is to prefer LowCardinality(String) over Enum for string columns in most cases: it offers the same or higher efficiency with far more flexibility. Every change to an Enum value set requires an ALTER TABLE, and inserting a value outside the enumeration raises an error — which can cause ingestion failures when upstream data introduces a new category. Reserve Enum for sets that are genuinely fixed and where you want an unknown value to be rejected.
Design Patterns
Converting existing String columns
Use ALTER TABLE ... MODIFY COLUMN. ClickHouse rewrites affected parts via a mutation:
ALTER TABLE events
MODIFY COLUMN event_type LowCardinality(String);
This is a metadata change plus a background mutation that re-encodes the column. On large tables, schedule it during lower-traffic periods and monitor system.mutations.
Combining with the sort key
LowCardinality columns make excellent leading or secondary sort-key columns when their cardinality is low, because they cluster identical values and improve compression of neighboring columns:
CREATE TABLE logs
(
service LowCardinality(String),
level LowCardinality(String),
ts DateTime,
message String
)
ENGINE = MergeTree
ORDER BY (service, level, ts);
Nullable LowCardinality
LowCardinality(Nullable(String)) is valid when you genuinely need NULLs. Note that adding nullability has a cost; if a sentinel like '' works for your data, the non-nullable form is leaner.
CREATE TABLE t
(
id UInt64,
status LowCardinality(Nullable(String))
)
ENGINE = MergeTree ORDER BY id;
Common Issues
Applied to a high-cardinality column. Wrapping a UUID, URL, or user-ID column in
LowCardinalityinflates the dictionary, slows inserts, and can make queries slower than plainString. Always run a cardinality check first.Suspicious typeerror on numeric columns.LowCardinalityover small fixed-width numeric types is blocked by default. This is intentional — see suspicious type for LowCardinality. Wrap aString/FixedStringinstead, or reconsider whether the column needs it.Wrapping already-tiny types. A
LowCardinality(UInt8)rarely pays off; the underlying type is already 1 byte.Expecting benefit on short, varied strings. The gains come from deduplicating long, repeated values. Short high-cardinality strings see negligible improvement.
Too many LowCardinality columns. Each carries dictionary overhead in memory. Convert the columns that demonstrably help, not every string column by reflex.
Best Practices
Measure cardinality first. Use
uniqExact()(oruniq()for a fast estimate) and target columns below ~10,000 distinct values. Be wary above ~100,000.Favor long, repetitive strings. The longer the string and the more it repeats, the larger the win.
Prefer
LowCardinality(String)overEnumunless the value set is truly fixed and you want unknown values rejected.Use low-cardinality columns in the sort key to cluster values and boost overall compression.
Validate after conversion. Compare
data_compressed_bytesinsystem.columnsbefore and after, and benchmark representative queries — don't assume the win.Leave the guardrails on. Keep
allow_suspicious_low_cardinality_types = 0in production unless you have a specific, measured reason to change it.
How Pulse Helps
Picking the right columns for LowCardinality requires knowing the real per-column cardinality, storage footprint, and query patterns across your cluster — not just a schema review. Pulse monitors ClickHouse deployments and surfaces this kind of detail: which columns are oversized for their data, how compression is performing, and which query patterns are scanning more than they should. That makes it straightforward to spot the string columns that would benefit from dictionary encoding, and to confirm the impact after you change the schema. Pulse is built and operated by engineers who run ClickHouse in production, so the recommendations reflect real operational tradeoffs rather than generic rules of thumb.
Frequently Asked Questions
Q: What cardinality is "low" enough for LowCardinality?
As a rule of thumb, columns with fewer than ~10,000 distinct values are strong candidates and usually read and store more efficiently. Above ~100,000 distinct values, LowCardinality can perform worse than the plain type. Measure with uniqExact() against your real data before deciding.
Q: Should I use LowCardinality or Enum?
Prefer LowCardinality(String) in most cases — it gives the same or better efficiency with far more flexibility. Enum requires an ALTER TABLE to add values and throws an exception when an undeclared value is inserted. Use Enum only when the value set is genuinely fixed and rejecting unknown values is the behavior you want.
Q: Can I convert an existing String column to LowCardinality?
Yes: ALTER TABLE t MODIFY COLUMN col LowCardinality(String). This triggers a background mutation that re-encodes the column, so run it during lower-traffic periods on large tables and watch system.mutations.
Q: Why am I getting a "Suspicious type for LowCardinality" error?
By default, LowCardinality over small fixed-width numeric types is blocked because it rarely helps and often hurts. This is controlled by allow_suspicious_low_cardinality_types. See allow_suspicious_low_cardinality_types and the error page.
Q: Does LowCardinality help with GROUP BY and filtering specifically?
Yes. Queries operate on integer dictionary positions and defer materializing strings, and for small dictionaries (up to low_cardinality_max_dictionary_size, default 8192) ClickHouse can precompute hashes — accelerating GROUP BY and equality filters. This is one of the main reasons to convert columns used heavily in those operations.
Q: Can LowCardinality wrap Nullable or non-string types?
Yes — LowCardinality(Nullable(String)) is valid, as are Date, DateTime, FixedString, and most numeric types (not Decimal). It is most valuable on String. Nullability adds some overhead, so prefer a non-nullable form with a sentinel value when NULLs aren't strictly required.