Idempotent Inserts into Materialized Views

Q: Does ReplicatedMergeTree deduplication automatically protect my materialized view?

No. By default deduplicate_blocks_in_dependent_materialized_views = 0 , so the source table deduplicates but the MV target does not participate in that decision. A retry can be skipped at the source while the MV target is left out of sync. Enable the setting (and ideally supply a token) to make the MV deduplicate in lockstep.

Q: What is the default value of deduplicate_blocks_in_dependent_materialized_views?

It is 0 (disabled) by default. You must enable it explicitly — per insert, per session, or in a user profile — to propagate deduplication from the source insert into dependent MV target tables.

A ClickHouse INSERT into a table that has an attached incremental materialized view is not one atomic operation — it writes to the source table and runs the MV's SELECT to write to the MV's target table. ClickHouse has no multi-table transaction to bind these together, so a retried or partially-failed insert can leave the source and the MV target out of sync. Making these inserts idempotent (safe to retry, exactly-once in effect) requires understanding how deduplication propagates — or fails to propagate — through the MV.

This guide covers MV-specific deduplication semantics, the settings that control them, and patterns for exactly-once inserts. For the general (non-MV) picture, see Insert Idempotency and Deduplication Patterns and the insert_deduplication_token setting.

How Block Deduplication Works (Recap)

For *ReplicatedMergeTree engines, ClickHouse deduplicates inserts by block. Each inserted block gets a block_id — by default a hash of the block's contents — recorded in ClickHouse Keeper / ZooKeeper. If a retried insert produces a block with a block_id already in the deduplication log, the block is skipped and the insert still returns success. This is what makes a plain insert into a single replicated table idempotent on retry.

Two settings bound the dedup log:

replicated_deduplication_window — number of recent block hashes retained per partition.
replicated_deduplication_window_seconds — time window for retaining them.

For non-replicated MergeTree, the equivalent is non_replicated_deduplication_window (off by default — it must be set to a non-zero value to get dedup on a plain MergeTree). You can also override the hash with an explicit insert_deduplication_token, which forces ClickHouse to deduplicate on your token instead of block contents.

Why Materialized Views Break the Simple Picture

An incremental MV is a trigger: when a block lands in the source table, ClickHouse runs the MV's SELECT over that block and inserts the result into the MV's target table. Two distinct problems arise.

Problem 1: The MV Target Is a Separate Table With Separate Dedup

By default, the setting deduplicate_blocks_in_dependent_materialized_views is 0 (disabled). With it off, deduplication is not propagated from the source insert into the MV's target table. The source table deduplicates a retried block, but the MV target does not participate in that decision — its own blocks are evaluated (or not) independently.

This creates the classic out-of-sync scenario:

An insert writes a block to the source table successfully.
The MV insert into the target table fails partway (timeout, too many parts, partition limit, node restart).
The client retries the whole insert.
The source table deduplicates the retried block (its block_id is already logged) and skips it.
Because the source skipped it, the MV trigger never fires for that block.
Result: the source has the data, the MV target permanently does not.

Problem 2: Identical MV Output From Different Source Data

The opposite failure happens when deduplicate_blocks_in_dependent_materialized_views = 1 but the MV produces identical output blocks from different source inserts. For example, an aggregating MV whose SELECT emits count() AS c may produce the byte-identical block {c: 1000} for two genuinely different source inserts. With MV dedup keyed on the MV output block's hash, the second (legitimate) block gets falsely deduplicated and dropped.

So neither default is universally safe: with dedup off, partial failures lose MV rows; with naive dedup on, coincidentally identical MV outputs get dropped.

The Setting: deduplicate_blocks_in_dependent_materialized_views

-- Per-insert
INSERT INTO source_table
SETTINGS deduplicate_blocks_in_dependent_materialized_views = 1
VALUES (...);

-- Or in a user profile / session
SET deduplicate_blocks_in_dependent_materialized_views = 1;

When enabled, ClickHouse computes the MV target's block_id from a string that combines the source block's block_id with additional table identifiers, rather than from the MV output bytes alone. This is the key fix for Problem 2: because the dedup key is derived from the source insertion identity, two different source inserts that happen to produce identical aggregated output get different MV block_ids and are both kept. It also makes the MV target participate in dedup consistently, narrowing Problem 1.

This setting must be enabled for the MV target to deduplicate in lockstep with the source. It is a no-op unless insert_deduplicate = 1 (the default) is also in effect.

Making It Idempotent With insert_deduplication_token

The most robust approach for an at-least-once pipeline is to control the dedup key explicitly with insert_deduplication_token. Instead of relying on content hashing, you supply a stable token tied to the logical batch (a Kafka offset range, a file name, a UUID generated once per batch and reused on every retry):

INSERT INTO source_table
SETTINGS
    insert_deduplication_token = 'batch-2026-06-02-000042',
    deduplicate_blocks_in_dependent_materialized_views = 1
SELECT * FROM input('...')
ORDER BY ALL;  -- deterministic ordering so retries reproduce identical blocks

When a token is supplied and deduplicate_blocks_in_dependent_materialized_views = 1, ClickHouse propagates a token derived from yours down to the MV target tables, so the source and MV deduplicate on the same logical identity. A retry with the same token is skipped consistently in both places — exactly-once in effect, as long as the retry happens within the dedup window.

A few requirements for token-based idempotency to actually hold:

Reuse the same token on every retry of the same logical batch. A fresh token per attempt defeats the purpose.
Make the insert deterministic. INSERT ... SELECT must produce the same blocks in the same order across retries, or the blocks won't match. Use ORDER BY ALL and avoid non-deterministic functions like now() or rand() in the inserted rows.
Stay inside the dedup window. If a retry arrives after replicated_deduplication_window / replicated_deduplication_window_seconds has rolled past, the old hash is gone and the block is treated as new.

Chained Materialized Views

When MVs are chained (source -> MV1 -> intermediate table -> MV2 -> ...), the dedup token must remain distinct at each hop, otherwise an inner MV can collide with the token used a level above. An older setting, update_insert_deduplication_token_in_dependent_materialized_views, existed to update the token with a table identifier at each level. In current ClickHouse this is no longer something you should configure: the deduplication token for blocks inserted by materialized views is derived from the source data and table identity automatically at each level, and the old setting is deprecated. Recent releases also extended consistent end-to-end deduplication to asynchronous inserts feeding dependent MVs. The practical takeaway: on modern ClickHouse (24.x/25.x and later), enable deduplicate_blocks_in_dependent_materialized_views and supply a stable insert_deduplication_token, and let ClickHouse manage per-level tokens for you.

Comparison of Approaches

Approach	Protects MV target on retry?	Risk of false dedup	When to use
Default (`deduplicate_blocks_in_dependent_materialized_views = 0`)	No — MV can go out of sync after partial failure	None (dedup off for MV)	Single table, no MV, or you handle dedup at query time
`deduplicate_blocks_in_dependent_materialized_views = 1`, content-hash	Mostly	Low — key derives from source block, not MV output	General MV pipelines without explicit batch IDs
`+ insert_deduplication_token` (stable per batch)	Yes — exactly-once within the window	None — you control identity	At-least-once ingestion with retries (Kafka, queues, ETL)
Query-time dedup (ReplacingMergeTree + `FINAL`)	N/A — tolerates duplicates instead	None	High-volume streaming where eventual dedup is acceptable

Inserting Directly Into an MV Target Table

If your pipeline writes directly into the MV's target table (bypassing the source table, e.g. to backfill), the MV trigger does not fire — you are just doing a normal insert into that target table. Idempotency there follows the ordinary rules: replicated block dedup or an insert_deduplication_token on that insert. The MV-propagation settings above only apply to inserts into the source table that the MV is attached to. For backfill semantics and POPULATE behavior, see the materialized view guide and the INSERT INTO reference.

Best Practices

Treat the insert as the unit of idempotency. Generate one stable insert_deduplication_token per logical batch and reuse it on every retry of that batch.
Enable deduplicate_blocks_in_dependent_materialized_views = 1 whenever you depend on retries being safe across an MV. The default of 0 leaves the MV target unprotected.
Make inserts deterministic. Use ORDER BY ALL for INSERT ... SELECT and keep non-deterministic functions out of inserted values, so retried blocks are byte-identical.
Size the dedup window to your retry horizon. Set replicated_deduplication_window_seconds comfortably larger than your worst-case retry/backoff delay.
Don't lean solely on block dedup for correctness. For data that must converge regardless of insert-path race conditions, combine it with a ReplacingMergeTree target and dedup at query time.
Verify, don't assume. After a forced retry, compare source-row counts against the MV target to confirm they stayed in sync.

Common Issues

"MV is missing rows the source has." A partial MV insert failed, then the retry was deduplicated at the source so the MV trigger never re-fired. Fix forward by enabling deduplicate_blocks_in_dependent_materialized_views and using a stable token; remediate existing gaps with a targeted backfill into the MV target.
"My second insert vanished from the MV." An aggregating MV produced output identical to a previous block and was deduplicated with the MV-dedup setting on but no distinguishing token. Supply an insert_deduplication_token so the dedup identity comes from the source insert, not the aggregate output.
"Retries still create duplicates." The token changes between attempts, the insert is non-deterministic, or the retry fell outside the dedup window. Pin the token, sort deterministically, and widen the window.
"Dedup does nothing on my plain MergeTree." Non-replicated tables have non_replicated_deduplication_window = 0 by default. Set it to a non-zero value or move to a replicated engine.

How Pulse Helps

Keeping a source table and its materialized-view targets consistent under real-world retries is one of the more error-prone parts of running ClickHouse, because the failure modes (silent MV gaps, false deduplication of legitimate batches) are invisible until someone reconciles row counts. Pulse monitors ClickHouse deployments for exactly these conditions — divergence between source and dependent MV targets, deduplication-window pressure, and insert error patterns that signal partial failures — and surfaces the configuration (token usage, deduplicate_blocks_in_dependent_materialized_views, window sizing) that determines whether your pipeline is genuinely idempotent. The team behind Pulse runs production ClickHouse clusters and can help you design exactly-once ingestion paths that survive retries without losing or duplicating MV data.

Frequently Asked Questions

Q: Does ReplicatedMergeTree deduplication automatically protect my materialized view?

No. By default deduplicate_blocks_in_dependent_materialized_views = 0, so the source table deduplicates but the MV target does not participate in that decision. A retry can be skipped at the source while the MV target is left out of sync. Enable the setting (and ideally supply a token) to make the MV deduplicate in lockstep.

Q: What is the default value of deduplicate_blocks_in_dependent_materialized_views?

It is 0 (disabled) by default. You must enable it explicitly — per insert, per session, or in a user profile — to propagate deduplication from the source insert into dependent MV target tables.

Q: Why did my second, legitimately-different insert get dropped from an aggregating MV?

Its MV output block was byte-identical to an earlier one (e.g. the same count()), and it was deduplicated on that hash. Provide an insert_deduplication_token tied to the source batch so the dedup identity comes from the source insertion rather than the aggregated result.

Q: Do I need to reuse the same insert_deduplication_token on retries?

Yes. The token is the logical identity of the batch. Generate it once per batch and send the identical token on every retry; a new token per attempt makes each retry look like fresh data.

Q: Does this apply to inserts made directly into the MV target table?

No. Inserting directly into a target table does not trigger the MV and follows ordinary single-table dedup rules. The MV-propagation settings only affect inserts into the source table the MV is attached to.

Q: What about asynchronous inserts feeding a materialized view?

Recent ClickHouse releases extended consistent end-to-end deduplication to async inserts and their dependent MVs, so the same token-based idempotency approach applies. Confirm the behavior on your exact version, since async-insert dedup semantics have evolved across releases.