ClickHouse provides built-in insert deduplication that makes inserts idempotent: if you retry the exact same insert after a network failure or timeout, ClickHouse silently skips the duplicate instead of writing the data twice. This is the foundation for safe at-least-once ingestion pipelines, but it depends on settings, window limits, and behaviors that surprise people who don't know the rules.
This guide covers how block-level deduplication works, the settings that control it, how to provide explicit deduplication tokens, the materialized-view gotcha, and where engine-level deduplication (like ReplacingMergeTree) fits in versus retry deduplication. For a focused reference on the token setting itself, see insert_deduplication_token.
How Block Deduplication Works
When you run an INSERT, ClickHouse splits the data into blocks and forms one or more data parts. For each block it computes a hash (a block_id) and records it in a deduplication log. On a subsequent insert, if a block's hash already exists in the log, that block is treated as a duplicate and skipped — the insert still returns a success status, so the client cannot tell the difference between "written" and "deduplicated".
There are two important consequences:
- Retries are safe. If a client doesn't receive an acknowledgement (network drop, proxy timeout) and resends the identical insert, the data is written at most once.
- The blocks must be byte-for-byte equivalent. Deduplication only matches when the retried insert produces the same blocks: same rows, same order, same block size. Reshuffling rows or changing the batch boundary defeats it.
The hash is computed from the formed part, after default/materialized columns and expressions are expanded, and deduplication is tracked per partition (defined by your PARTITION BY clause).
Replicated vs. Non-Replicated Defaults
Deduplication behavior differs sharply between replicated and plain MergeTree tables. This is the single most common source of confusion.
| Aspect | ReplicatedMergeTree | Plain MergeTree |
|---|---|---|
| Deduplication by default | Enabled | Disabled (backward compatibility) |
| Window setting | replicated_deduplication_window (default 1000 blocks) |
non_replicated_deduplication_window (default 0 = off) |
| Time window | replicated_deduplication_window_seconds (default 604800 = 7 days) |
(no time-based window for non-replicated tables) |
| Where hashes are stored | ClickHouse Keeper / ZooKeeper (/blocks znode per partition) |
Local disk (deduplication_logs/ directory) |
For ReplicatedMergeTree and ClickHouse Cloud's SharedMergeTree, deduplication is on out of the box. For a plain MergeTree, you must opt in:
CREATE TABLE events (A Int64)
ENGINE = MergeTree
ORDER BY A
SETTINGS non_replicated_deduplication_window = 1000;
The window settings (replicated_deduplication_window / non_replicated_deduplication_window) are table-level MergeTree settings, not session settings. They define how many recent block hashes are retained. Raising the window improves the retry guarantee but adds comparison overhead and, for replicated tables, more state in Keeper.
Deduplication Windows and Their Limits
A block is only deduplicated while its hash is still in the window. The window is bounded by both a count (replicated_deduplication_window, default 1000) and a time (replicated_deduplication_window_seconds, default 7 days), whichever is reached first.
If a retry arrives after 1000 newer blocks have been inserted, or after the time window elapses, the original hash may have been evicted and the retry will be written as a new part — a duplicate. Size the window to comfortably exceed your worst-case retry latency and insert rate:
ALTER TABLE events MODIFY SETTING replicated_deduplication_window = 1000;
ALTER TABLE events MODIFY SETTING replicated_deduplication_window_seconds = 86400;
Note that deduplication hashes survive ALTER TABLE ... DELETE. Only DROP PARTITION or TRUNCATE clear the dedup log for the affected partition.
Explicit Deduplication Tokens
By default the dedup key is the data hash. Since ClickHouse 22.2 you can override it with an explicit insert_deduplication_token (a session/query setting). When provided, the token takes priority over the data hash:
-- Same token => second insert is deduplicated even though data differs
INSERT INTO events SETTINGS insert_deduplication_token = 'batch-42' VALUES (1);
INSERT INTO events SETTINGS insert_deduplication_token = 'batch-42' VALUES (2); -- skipped
Tokens are the right tool when:
- Your upstream system already has a stable batch ID (Kafka offset range, transaction ID, file name + chunk number). Reuse it so retries dedup reliably regardless of row ordering.
- You want to insert identical data on purpose — give each legitimate batch a distinct token so the data hash doesn't accidentally collide.
The catch is symmetrical to its power: if your retry logic generates a new token each attempt, deduplication never triggers. The token must be deterministic for a given logical batch. See insert_deduplication_token for token-generation guidance.
Idempotent INSERT ... SELECT
INSERT ... SELECT is deduplicated the same way, but it has a subtle requirement: the SELECT must return the same rows in the same order on every retry. If the source table changes between attempts, or the result ordering is nondeterministic, the formed blocks differ and deduplication fails.
Stabilize the ordering explicitly:
INSERT INTO target
SELECT * FROM source
WHERE event_date = '2026-06-01'
ORDER BY ALL;
For the same reason, prefer an explicit insert_deduplication_token derived from the logical job (for example the date partition being backfilled) rather than relying on data-hash matching for INSERT ... SELECT.
The Materialized View Gotcha
A materialized view insert and the source-table insert are two separate inserts — ClickHouse has no transaction binding them. By default, deduplication is computed independently for the source table and is not propagated to dependent materialized views. Enable propagation with:
SET deduplicate_blocks_in_dependent_materialized_views = 1;
But this introduces a real hazard. The MV deduplicates on the hash of the MV's output block, not the source rows. If two genuinely different source inserts produce the same aggregated output (for example two separate 100-row batches that both yield count() = 100), the second MV write is falsely skipped while the source row is written — silently corrupting your aggregate.
The fix is to inject uniqueness into the MV output so distinct batches produce distinct blocks:
CREATE MATERIALIZED VIEW events_mv
ENGINE = ReplicatedSummingMergeTree(...)
PARTITION BY toYYYYMM(D) ORDER BY D AS
SELECT
D,
count() AS cnt,
sum(cityHash64(*)) AS dedup_marker -- makes each source batch's block unique
FROM events
GROUP BY D;
Alternatively carry an explicit insert_id column through the view. Either way, the goal is that two different logical batches never hash to the same MV block.
Retry Deduplication vs. Engine-Level Deduplication
Insert deduplication and engine-level deduplication solve different problems and are frequently confused.
| Insert (retry) deduplication | ReplacingMergeTree | |
|---|---|---|
| What it dedups | Identical re-sent insert blocks | Rows with the same sorting key |
| When | At insert time, within the dedup window | Eventually, during background merges |
| Guarantee | Block written at most once | Duplicates collapsed eventually, not immediately |
| Use for | Safe insert retries / idempotency | Upserts, last-write-wins on a business key |
| Reads | Always correct | Need FINAL or query-time dedup until merged |
Insert deduplication protects you from re-sending the same batch. It does not deduplicate two logically equal rows that arrive in different batches — that's what ReplacingMergeTree (or argMax/GROUP BY at query time) is for. A common mistake is expecting ReplacingMergeTree to deduplicate immediately; it only collapses rows after merges run, which is why the ReplacingMergeTree duplicate gotcha exists. Use both together: insert dedup for transport-level idempotency, ReplacingMergeTree for business-key deduplication.
Async Inserts
Asynchronous inserts buffer many client inserts server-side before flushing them into a single part. Historically async inserts disabled deduplication by default, so if you relied on retry idempotency you had to enable it explicitly:
INSERT INTO events
SETTINGS async_insert = 1, async_insert_deduplicate = 1
VALUES (...);
With async_insert_deduplicate = 1, ClickHouse computes a dedup hash (or honors your insert_deduplication_token) per source insert within the flushed batch. If your application can send the same payload twice and you depend on the server to drop it, set this explicitly — do not assume async inserts dedup on their own.
Best Practices
- Make retried inserts identical. Same rows, same order, same batch boundaries. Buffer the batch on the client and resend the exact bytes on retry rather than rebuilding it.
- Use explicit tokens for anything non-trivial. For
INSERT ... SELECT, multi-step pipelines, or Kafka/file ingestion, derive a deterministicinsert_deduplication_tokenfrom the logical batch instead of relying on data hashing. - Size the window to your retry budget. Ensure
replicated_deduplication_windowand..._secondsexceed the maximum time and insert volume between an attempt and its retry. - Handle the unknown-status case. If every retry fails, you cannot know whether any attempt landed. Verify with a follow-up query keyed on your token/batch ID before assuming failure.
- Be deliberate with materialized views. Only set
deduplicate_blocks_in_dependent_materialized_views = 1if you've made MV output blocks unique per batch; otherwise you risk silent under-counting. - Don't conflate transport dedup with business dedup. Pair insert deduplication with ReplacingMergeTree or query-time
argMax/GROUP BYfor key-based deduplication.
Common Issues
- Retries still create duplicates. The blocks weren't identical (reordered rows, different batch size) or the dedup window was exceeded. Use a stable token and a larger window.
- Inserting the same data on purpose gets silently dropped. The data hash matched a recent block. Give each intentional batch a distinct
insert_deduplication_token. - Source table is correct but the MV is short.
deduplicate_blocks_in_dependent_materialized_viewsis off, or MV output blocks collide. Add a uniqueness marker to the MV's select. - Plain MergeTree never deduplicates.
non_replicated_deduplication_windowdefaults to 0; set it explicitly per table. - Async inserts duplicate on retry. Deduplication isn't automatic for async inserts; enable
async_insert_deduplicate = 1.
How Pulse Helps
Insert idempotency problems are usually invisible until your row counts drift or a dashboard under-reports. Pulse monitors ClickHouse ingestion paths and surfaces the conditions that quietly break deduplication: dedup windows sized too small for actual retry latency, replicated tables losing block hashes from Keeper, async-insert paths running without deduplication, and materialized views with mismatched dedup settings. When duplicates or gaps appear, Pulse correlates them with insert patterns, part creation, and Keeper state so you can pinpoint whether the issue is a retry, a window eviction, or a missing token — and our team helps you design idempotent pipelines that hold up under failure rather than relying on best-effort retries.
Frequently Asked Questions
Q: Is insert deduplication enabled by default?
Yes for ReplicatedMergeTree (and ClickHouse Cloud's SharedMergeTree). No for plain MergeTree, where you must set non_replicated_deduplication_window to a non-zero value.
Q: How long are duplicate inserts remembered?
Until the block hash leaves the window, which is bounded by both a count (replicated_deduplication_window, default 1000) and a time (replicated_deduplication_window_seconds, default 7 days), whichever comes first.
Q: Does deduplication guarantee exactly-once inserts?
It gives at-most-once for a given block within the window, which combined with at-least-once retries approximates exactly-once. It is not a transaction: if all retries fail you still cannot confirm whether data landed, so verify with a token-keyed query.
Q: Why are my deliberately repeated inserts being dropped?
Their data hash matches a recent block. Assign each intentional batch a distinct insert_deduplication_token so ClickHouse treats them as different inserts.
Q: Does ReplacingMergeTree replace insert deduplication?
No. ReplacingMergeTree collapses rows with the same sorting key eventually during merges (business-key dedup). Insert deduplication prevents re-sent batches from being written twice (transport idempotency). Use both.
Q: Do async inserts deduplicate retries?
Not by default. Set async_insert_deduplicate = 1 so ClickHouse computes a dedup hash or honors your token per source insert in the flushed batch.