LowCardinality Type: Design, Performance, and Best Practices

LowCardinality changes the internal representation of a column to dictionary encoding: instead of storing each value inline, ClickHouse stores the distinct values once in a dictionary and replaces every row with a small integer reference. For columns with a limited number of distinct values, this cuts storage and accelerates filters and GROUP BY operations — often dramatically.

The catch is the word low. Applied to the right columns, LowCardinality is one of the highest-leverage schema changes you can make. Applied to a high-cardinality column, it adds overhead and can perform worse than a plain type. This guide covers how to identify good candidates, the tradeoffs involved, and how LowCardinality compares to Enum.

How LowCardinality Works

A LowCardinality(T) column is stored as two structures: a dictionary holding each distinct value of the underlying type T exactly once, and an index array of integer positions — one per row — pointing into the dictionary.

CREATE TABLE events
(
    id UInt64,
    event_type LowCardinality(String),
    region     LowCardinality(String),
    ts         DateTime
)
ENGINE = MergeTree
ORDER BY (event_type, ts);

Dictionaries are built per data part. ClickHouse can keep a single common dictionary per part when the number of distinct values is small (up to 8,192 distinct values by default), which unlocks extra optimizations — most queries operate on the integer positions and defer materializing the actual strings until late stages, and hashes for GROUP BY can be precomputed against the dictionary rather than recomputed per row.

LowCardinality is a wrapper, not a standalone type. It can wrap String, FixedString, Date, DateTime, and numeric types (Decimal is not supported). In practice it is most valuable on String columns. For the exact syntax accepted by CREATE TABLE, see the CREATE TABLE reference; for type-syntax errors, see Illegal syntax for data type.

When LowCardinality Helps

The efficiency of LowCardinality depends almost entirely on the number of distinct values. The official guidance:

  • Below ~10,000 distinct values: ClickHouse usually shows higher efficiency for both reading and storing. This is the sweet spot.
  • Above ~100,000 distinct values: LowCardinality can perform worse than the plain underlying type, because the dictionary itself grows large and the position index widens.

The effect is most pronounced when the underlying strings are long and repetitive. Converting a long, path-like or descriptive string column with few distinct values can yield large gains — Altinity's benchmark on a 172M-row dataset showed queries running roughly 1.5–3.5x faster and storage shrinking around 2.5x compressed (and ~10x uncompressed) after converting a city-name column to LowCardinality(String). Short strings (a few characters) with high cardinality see little to no benefit.

Good candidates share a profile:

  • A bounded set of repeating values: status codes, country/region names, event types, HTTP methods, log levels, service names, currency codes.
  • Values that recur across millions of rows.
  • Columns frequently used in WHERE, GROUP BY, or ORDER BY.

Poor candidates: high-cardinality identifiers (UUIDs, user IDs, request IDs, URLs with unbounded variety), free-text fields, and already-tiny columns like a single-byte flag.

Cardinality Analysis: Measuring Before You Decide

Don't guess. Measure the distinct-value count of each candidate column against your real data before converting.

SELECT
    uniqExact(event_type) AS event_type_card,
    uniqExact(region)     AS region_card,
    uniqExact(user_id)    AS user_id_card,
    count()               AS rows
FROM events;

For a quick scan across many columns, uniq() (approximate) is cheaper than uniqExact() on large tables:

SELECT
    uniq(url)       AS url_card,
    uniq(referrer)  AS referrer_card,
    uniq(country)   AS country_card
FROM web_logs;

Cardinality can also vary by part or partition. Because dictionaries are built per part, what matters is the distinct count within a part, not just globally. A column that is low-cardinality overall but high-cardinality within each daily partition still benefits; the reverse is rarely an issue in practice. After conversion, you can compare actual on-disk size:

SELECT
    name,
    formatReadableSize(sum(data_compressed_bytes))   AS compressed,
    formatReadableSize(sum(data_uncompressed_bytes)) AS uncompressed
FROM system.columns
WHERE table = 'events' AND database = currentDatabase()
GROUP BY name
ORDER BY sum(data_compressed_bytes) DESC;

Memory and CPU Tradeoffs

LowCardinality is not free. Understand the costs before applying it broadly:

  • Insert-time CPU: Every insert must build or extend the per-part dictionary and encode positions. On high-cardinality columns this work is wasted because the dictionary nearly equals the row count.
  • Memory during queries: Dictionaries are loaded into memory. Many LowCardinality columns, each with large dictionaries, increase memory pressure — particularly during distributed GROUP BY and merges.
  • Marginal benefit on tiny types: Wrapping a UInt8 or a short FixedString in LowCardinality often adds dictionary overhead without meaningful savings.

Two settings govern dictionary behavior:

  • low_cardinality_max_dictionary_size (default 8192) — the maximum dictionary size for which ClickHouse keeps a single shared dictionary per part. Beyond this, additional per-block dictionaries are used, which reduces the optimization benefit.
  • low_cardinality_use_single_dictionary_for_part (default 0) — controls whether a single dictionary is enforced for the whole part.

To prevent accidental misuse on inappropriately small or risky types, ClickHouse has allow_suspicious_low_cardinality_types (default 0), which blocks LowCardinality over fixed-width numeric types of 8 bytes or less and FixedString of 8 bytes or less. See the dedicated page on allow_suspicious_low_cardinality_types and the related suspicious type for LowCardinality error for how to handle that guardrail.

LowCardinality vs. Enum

Both Enum and LowCardinality are efficient ways to store a small set of repeating string values, and both store an integer per row internally. They differ in flexibility and operational cost.

Aspect LowCardinality(String) Enum8 / Enum16
Value set Open — any new value is added automatically Fixed — defined in the schema
Adding a value No action needed Requires ALTER TABLE ... MODIFY COLUMN
Unknown value on insert Accepted Throws an exception
Max distinct values Practically large (efficient below ~10k) 256 (Enum8) / 65,536 (Enum16)
Storage Dictionary + positions, per part 1 or 2 bytes per row, mapping in schema
Best for Evolving / unknown value sets Truly static, known-in-advance sets

The official recommendation is to prefer LowCardinality(String) over Enum for string columns in most cases: it offers the same or higher efficiency with far more flexibility. Every change to an Enum value set requires an ALTER TABLE, and inserting a value outside the enumeration raises an error — which can cause ingestion failures when upstream data introduces a new category. Reserve Enum for sets that are genuinely fixed and where you want an unknown value to be rejected.

Design Patterns

Converting existing String columns

Use ALTER TABLE ... MODIFY COLUMN. ClickHouse rewrites affected parts via a mutation:

ALTER TABLE events
    MODIFY COLUMN event_type LowCardinality(String);

This is a metadata change plus a background mutation that re-encodes the column. On large tables, schedule it during lower-traffic periods and monitor system.mutations.

Combining with the sort key

LowCardinality columns make excellent leading or secondary sort-key columns when their cardinality is low, because they cluster identical values and improve compression of neighboring columns:

CREATE TABLE logs
(
    service  LowCardinality(String),
    level    LowCardinality(String),
    ts       DateTime,
    message  String
)
ENGINE = MergeTree
ORDER BY (service, level, ts);

Nullable LowCardinality

LowCardinality(Nullable(String)) is valid when you genuinely need NULLs. Note that adding nullability has a cost; if a sentinel like '' works for your data, the non-nullable form is leaner.

CREATE TABLE t
(
    id     UInt64,
    status LowCardinality(Nullable(String))
)
ENGINE = MergeTree ORDER BY id;

Common Issues

  1. Applied to a high-cardinality column. Wrapping a UUID, URL, or user-ID column in LowCardinality inflates the dictionary, slows inserts, and can make queries slower than plain String. Always run a cardinality check first.

  2. Suspicious type error on numeric columns. LowCardinality over small fixed-width numeric types is blocked by default. This is intentional — see suspicious type for LowCardinality. Wrap a String/FixedString instead, or reconsider whether the column needs it.

  3. Wrapping already-tiny types. A LowCardinality(UInt8) rarely pays off; the underlying type is already 1 byte.

  4. Expecting benefit on short, varied strings. The gains come from deduplicating long, repeated values. Short high-cardinality strings see negligible improvement.

  5. Too many LowCardinality columns. Each carries dictionary overhead in memory. Convert the columns that demonstrably help, not every string column by reflex.

Best Practices

  1. Measure cardinality first. Use uniqExact() (or uniq() for a fast estimate) and target columns below ~10,000 distinct values. Be wary above ~100,000.

  2. Favor long, repetitive strings. The longer the string and the more it repeats, the larger the win.

  3. Prefer LowCardinality(String) over Enum unless the value set is truly fixed and you want unknown values rejected.

  4. Use low-cardinality columns in the sort key to cluster values and boost overall compression.

  5. Validate after conversion. Compare data_compressed_bytes in system.columns before and after, and benchmark representative queries — don't assume the win.

  6. Leave the guardrails on. Keep allow_suspicious_low_cardinality_types = 0 in production unless you have a specific, measured reason to change it.

How Pulse Helps

Picking the right columns for LowCardinality requires knowing the real per-column cardinality, storage footprint, and query patterns across your cluster — not just a schema review. Pulse monitors ClickHouse deployments and surfaces this kind of detail: which columns are oversized for their data, how compression is performing, and which query patterns are scanning more than they should. That makes it straightforward to spot the string columns that would benefit from dictionary encoding, and to confirm the impact after you change the schema. Pulse is built and operated by engineers who run ClickHouse in production, so the recommendations reflect real operational tradeoffs rather than generic rules of thumb.

Frequently Asked Questions

Q: What cardinality is "low" enough for LowCardinality?

As a rule of thumb, columns with fewer than ~10,000 distinct values are strong candidates and usually read and store more efficiently. Above ~100,000 distinct values, LowCardinality can perform worse than the plain type. Measure with uniqExact() against your real data before deciding.

Q: Should I use LowCardinality or Enum?

Prefer LowCardinality(String) in most cases — it gives the same or better efficiency with far more flexibility. Enum requires an ALTER TABLE to add values and throws an exception when an undeclared value is inserted. Use Enum only when the value set is genuinely fixed and rejecting unknown values is the behavior you want.

Q: Can I convert an existing String column to LowCardinality?

Yes: ALTER TABLE t MODIFY COLUMN col LowCardinality(String). This triggers a background mutation that re-encodes the column, so run it during lower-traffic periods on large tables and watch system.mutations.

Q: Why am I getting a "Suspicious type for LowCardinality" error?

By default, LowCardinality over small fixed-width numeric types is blocked because it rarely helps and often hurts. This is controlled by allow_suspicious_low_cardinality_types. See allow_suspicious_low_cardinality_types and the error page.

Q: Does LowCardinality help with GROUP BY and filtering specifically?

Yes. Queries operate on integer dictionary positions and defer materializing strings, and for small dictionaries (up to low_cardinality_max_dictionary_size, default 8192) ClickHouse can precompute hashes — accelerating GROUP BY and equality filters. This is one of the main reasons to convert columns used heavily in those operations.

Q: Can LowCardinality wrap Nullable or non-string types?

Yes — LowCardinality(Nullable(String)) is valid, as are Date, DateTime, FixedString, and most numeric types (not Decimal). It is most valuable on String. Nullability adds some overhead, so prefer a non-nullable form with a sentinel value when NULLs aren't strictly required.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.