SnowflakeID and Distributed ID Schemes for ClickHouse Primary Keys

Q: Why is my decoded timestamp decades off?

Epoch mismatch. ClickHouse's generateSnowflakeID and ...SnowflakeID... functions default to the UNIX epoch (1970-01-01). IDs from systems using the Twitter/X epoch must be decoded with the matching epoch argument, otherwise the timestamp is shifted by a fixed offset.

Q: Can I use a SnowflakeID with ReplacingMergeTree for upserts?

Only if the ID is deterministic for a given logical row. generateSnowflakeID() produces a new value every call, so it will not collapse duplicate inserts. For idempotent upserts, derive the key from business columns (e.g. a hash) instead of generating a fresh ID.

Q: Which functions are compatible with `generateSnowflakeID`?

Use snowflakeIDToDateTime , snowflakeIDToDateTime64 , dateTimeToSnowflakeID , and dateTime64ToSnowflakeID (note the "ID" in the name). The older snowflakeToDateTime / dateTimeToSnowflake family uses the same bit structure but defaults to the Twitter/X epoch instead of the UNIX epoch, and accepts/returns Int64 rather than UInt64 — making it incompatible with generateSnowflakeID .

A primary key in ClickHouse does two jobs at once: it provides uniqueness for your rows and it defines the physical sort order that drives every range scan. A good identifier scheme satisfies both. SnowflakeID — a 64-bit integer that packs a millisecond timestamp, a machine ID, and a per-millisecond counter — is one of the best fits for this because it is compact, globally unique, and roughly time-ordered.

This guide explains why time-ordered distributed IDs make better ClickHouse primary keys than random UUIDs, how to generate and decode SnowflakeIDs with ClickHouse's built-in functions, and when other schemes are the right call.

Why ID Shape Matters for a ClickHouse Primary Key

ClickHouse's MergeTree engine stores data sorted by the ORDER BY key and builds a sparse primary index over that sort order. The index works best when newly inserted rows land in a narrow, advancing range of key values. That property has two big consequences:

Compression: adjacent values that are similar compress far better. A monotonically increasing integer column compresses to a fraction of the size of a column of random 128-bit UUIDs.
Insert and merge cost: when each batch of inserted rows covers a small, recent key range, parts stay well-ordered and merges do less work. Random keys spread every insert across the entire key space.

A purely random identifier (such as a v4 UUID) destroys both properties: it is wide, incompressible, and scatters writes across the whole table. A time-ordered ID like SnowflakeID preserves them while still being unique across many independent writers.

What a SnowflakeID Is

A SnowflakeID is a single UInt64 that encodes three fields (this is the layout ClickHouse's generateSnowflakeID uses):

Field	Bits	Purpose
Sign / reserved	1 (top bit, always 0)	Keeps the value non-negative
Timestamp	41	Milliseconds since the epoch (UNIX epoch, 1970-01-01, in ClickHouse)
Machine ID	10	Identifies the generating node (0–1023)
Counter	12	Distinguishes IDs created within the same millisecond on the same machine

Because the timestamp occupies the high bits, sorting SnowflakeIDs numerically is approximately the same as sorting by creation time. That is exactly the ordering you want for time-series and event data, and it lets a single integer column serve as both the unique key and a coarse time filter.

Note on epochs: ClickHouse's built-in functions default to the UNIX epoch (1970-01-01). The original Twitter/X implementation and others (e.g. Mastodon) use different epochs, so IDs generated elsewhere may need an explicit epoch argument when you decode them. Mixing epochs silently shifts the timestamp.

Generating SnowflakeIDs in ClickHouse

generateSnowflakeID was introduced in ClickHouse v24.6 and returns a UInt64.

-- Basic generation
SELECT generateSnowflakeID();

-- With a machine_id (lowest 10 bits are used) to avoid
-- collisions across nodes in a cluster
SELECT generateSnowflakeID(1, 42);

Syntax: generateSnowflakeID([expr, [machine_id]])

expr — an arbitrary expression whose only purpose is to defeat common subexpression elimination when you call the function more than once in a single query. It does not affect the generated value.
machine_id — optional Int64; only the lowest 10 bits are used. Supply distinct values per node so concurrent generators cannot collide.

The counter field increments monotonically across all invocations within the same millisecond, including across concurrently running threads and queries on the same server, so a single server will not produce duplicate IDs.

Using it as a default in a table

You can populate a SnowflakeID column automatically and use it directly in the sort key:

CREATE TABLE events
(
    id        UInt64 DEFAULT generateSnowflakeID(),
    user_id   UInt64,
    event_type String,
    payload   String
)
ENGINE = MergeTree
ORDER BY id;

Any insert that omits id gets a fresh, time-ordered SnowflakeID. See the CREATE TABLE reference for column default options (DEFAULT vs MATERIALIZED).

In practice you rarely sort by id alone. A common pattern keeps a low-cardinality leading column for partition-friendly filtering and uses the SnowflakeID to break ties and provide time ordering:

CREATE TABLE events
(
    id         UInt64 DEFAULT generateSnowflakeID(),
    tenant_id  UInt32,
    event_type LowCardinality(String),
    payload    String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(snowflakeIDToDateTime(id))
ORDER BY (tenant_id, id);

Here id still contributes time ordering inside each (tenant_id) group, and the partition expression derives the month directly from the ID — no separate timestamp column required. For background on choosing leading key columns, see the ClickHouse primary index guide.

Decoding a SnowflakeID Back to a Timestamp

ClickHouse ships conversion functions (also v24.6+) that are compatible with generateSnowflakeID and default to the UNIX epoch:

-- Extract the creation time of a row from its ID
SELECT
    id,
    snowflakeIDToDateTime(id)   AS created_at,         -- DateTime, second precision
    snowflakeIDToDateTime64(id) AS created_at_ms       -- DateTime64(3), millisecond precision
FROM events
LIMIT 5;

-- Go the other direction: the first SnowflakeID at a given time.
-- Useful for range filtering without a separate timestamp column.
SELECT dateTimeToSnowflakeID(now() - INTERVAL 1 DAY) AS cutoff;

Because the timestamp lives in the high bits, you can filter a time window with a plain integer range that the primary index handles efficiently:

SELECT count()
FROM events
WHERE id >= dateTimeToSnowflakeID(toDateTime('2026-01-01 00:00:00'))
  AND id <  dateTimeToSnowflakeID(toDateTime('2026-02-01 00:00:00'));

Function	Input	Returns	Notes
`generateSnowflakeID([expr[, machine_id]])`	—	`UInt64`	Generates a new ID; base function v24.6+, `machine_id` parameter v24.7+
`snowflakeIDToDateTime(value[, epoch[, tz]])`	`UInt64`	`DateTime`	Second precision
`snowflakeIDToDateTime64(value[, epoch[, tz]])`	`UInt64`	`DateTime64(3)`	Millisecond precision
`dateTimeToSnowflakeID(value[, epoch])`	`DateTime`	`UInt64`	First ID at that time
`dateTime64ToSnowflakeID(value[, epoch])`	`DateTime64`	`UInt64`	First ID at that time

Do not confuse these with the older snowflakeToDateTime / dateTimeToSnowflake family (no "ID" in the name). Those functions use the same bit structure but default to the Twitter/X epoch (1288834974657 ms since UNIX epoch) rather than the UNIX epoch, and they accept/return Int64 rather than UInt64, making them not compatible with generateSnowflakeID. Prefer the ...SnowflakeID... functions for anything generated by generateSnowflakeID.

SnowflakeID vs UUID vs Sequential ID

Property	SnowflakeID (`UInt64`)	UUIDv4 (random)	UUIDv7 (time-ordered)	Sequential / counter
Width	64 bits	128 bits	128 bits	32–64 bits
Time-ordered	Yes (high bits)	No	Yes	Yes
Globally unique across writers	Yes (via machine ID)	Yes	Yes	No — needs coordination
Compresses well as a sort key	Yes	No	Partially	Yes
Embeds a usable timestamp	Yes (decode it back)	No	Yes (v7)	No
Needs central coordination	No	No	No	Yes

Practical guidance:

SnowflakeID is the sweet spot when you want a compact, time-ordered, collision-free key generated independently by many writers, and you value being able to recover the creation time from the ID. It is the most ClickHouse-friendly of these for a primary/sort key.
UUIDv4 is fine when the value comes from an external system you do not control, but avoid making a random UUID the leading column of your sort key — it compresses poorly and scatters inserts. If you must store one, keep it as an attribute and lead the sort key with a time or low-cardinality column.
UUIDv7 gives you UUID compatibility with time ordering; it is wider than SnowflakeID but a reasonable choice if downstream systems require the UUID type.
Sequential IDs compress best of all but require a single coordinator or a sequence service; in a distributed ClickHouse setup that coordination is usually the thing you are trying to avoid, which is precisely the problem SnowflakeID solves.

A Note on Nano ID and Other Schemes

Schemes like Nano ID, KSUID, and ULID are popular in application code, but they are string-based (Base62/Base32) rather than integers. Storing them as String in a ClickHouse sort key is much less efficient than a UInt64: they are wider, compress worse, and (for Nano ID, which is fully random) are not time-ordered. ULID and KSUID are time-ordered, so as a lexicographically sortable String they at least preserve insert locality — but a numeric SnowflakeID is still smaller and faster as a key. If your IDs arrive as ULID/KSUID strings from upstream, treat them as attributes and derive an integer sort key (for example from the embedded timestamp) where you can.

Best Practices

Lead the sort key with low-cardinality or time columns, not the raw ID. A SnowflakeID is excellent for time ordering, but ORDER BY (tenant_id, id) usually prunes better than ORDER BY id alone for multi-tenant filtering. Match the primary index to your real query patterns.
Assign distinct machine_id values per node. The 10-bit machine field only prevents cross-node collisions if each generator uses a different value. Relying on the default (0) everywhere reintroduces collision risk across a cluster.
Keep epochs consistent. If you ingest SnowflakeIDs generated outside ClickHouse (e.g. by an application using the Twitter/X epoch), pass the matching epoch argument to the conversion functions, or normalize on ingest.
Skip the separate timestamp column when you can. snowflakeIDToDateTime(id) recovers creation time, so you can derive partitions and time filters from the ID itself and save a column.
Don't deduplicate on a SnowflakeID with ReplacingMergeTree expecting idempotency. Each generateSnowflakeID() call is unique, so re-inserting the "same" logical row with a freshly generated ID will not collapse. If you need idempotent upserts, derive the key deterministically (e.g. from a hash of business columns) rather than generating a new ID. See ReplacingMergeTree.

Common Issues

IDs appear to be ~40 years in the future. This is almost always an epoch mismatch: decoding a UNIX-epoch ID as if it used the Twitter/X epoch (or vice versa) shifts the timestamp by a large constant. Confirm which epoch generated the ID and pass it explicitly.
generateSnowflakeID returns the same value when called twice in one query. Common subexpression elimination collapsed the calls. Pass a distinguishing expr argument (for example generateSnowflakeID(rand()) or a row-varying column).
Poor compression / slow merges after switching to UUIDv4. A random 128-bit key as the leading sort column is the cause. Re-lead the sort key with a time-ordered column (a SnowflakeID is ideal) and keep the random UUID as a non-key attribute.
Function not found on older servers. generateSnowflakeID and the ...SnowflakeID... conversion functions require ClickHouse v24.6 or later. The machine_id parameter for generateSnowflakeID requires v24.7 or later. On older versions you must generate IDs in the application or compose them manually with bit operations.

How Pulse Helps

Choosing the right primary key and ID scheme is a one-way door: changing the ORDER BY of a populated table means rebuilding it. Pulse provides ongoing monitoring and expert review of ClickHouse deployments, including schema and primary-key design. Pulse flags tables whose sort keys are hurting compression or merge performance — such as random UUIDs leading the key — and reviews ID and partitioning strategies before they become expensive to undo. If you are deciding between SnowflakeID, UUIDv7, and sequential IDs for a new table, or diagnosing write amplification on an existing one, Pulse gives you a second set of expert eyes.

Frequently Asked Questions

Q: Is a SnowflakeID guaranteed to be unique across my whole cluster?

Within a single server, the monotonic counter prevents duplicates. Across servers, uniqueness depends on you assigning a distinct machine_id (the 10-bit field) to each generator. With unique machine IDs and the same-millisecond counter, collisions are effectively impossible at normal insert rates.

Q: Can I recover the exact insert time from a SnowflakeID?

You can recover the millisecond timestamp embedded at generation time with snowflakeIDToDateTime64(id). That is the moment the ID was generated, which for a DEFAULT generateSnowflakeID() column equals insert time. It is not affected by later merges.

Q: Should I use SnowflakeID or UUIDv7?

Both are time-ordered. Prefer SnowflakeID when you want a compact 64-bit integer key and don't need UUID-type compatibility — it is smaller and compresses better as a sort key. Prefer UUIDv7 when downstream systems specifically require the UUID type.

Q: Why is my decoded timestamp decades off?

Epoch mismatch. ClickHouse's generateSnowflakeID and ...SnowflakeID... functions default to the UNIX epoch (1970-01-01). IDs from systems using the Twitter/X epoch must be decoded with the matching epoch argument, otherwise the timestamp is shifted by a fixed offset.

Q: Can I use a SnowflakeID with ReplacingMergeTree for upserts?

Only if the ID is deterministic for a given logical row. generateSnowflakeID() produces a new value every call, so it will not collapse duplicate inserts. For idempotent upserts, derive the key from business columns (e.g. a hash) instead of generating a fresh ID.

Q: Which functions are compatible with generateSnowflakeID?

Use snowflakeIDToDateTime, snowflakeIDToDateTime64, dateTimeToSnowflakeID, and dateTime64ToSnowflakeID (note the "ID" in the name). The older snowflakeToDateTime / dateTimeToSnowflake family uses the same bit structure but defaults to the Twitter/X epoch instead of the UNIX epoch, and accepts/returns Int64 rather than UInt64 — making it incompatible with generateSnowflakeID.