The "DB::Exception: Deduplication is not possible" error in ClickHouse indicates that the server cannot perform its built-in insert deduplication for the current operation. The DEDUPLICATION_IS_NOT_POSSIBLE error code is raised when the conditions required for ClickHouse's block-level deduplication mechanism are not met in the current context.
Impact
When this error occurs, the INSERT operation will fail rather than risk inserting duplicate data. This is a protective measure, as ClickHouse's deduplication mechanism is designed to ensure exactly-once insert semantics for ReplicatedMergeTree tables. The failure means your data was not written, and the insert must be reconfigured or retried under conditions that support deduplication.
Common Causes
- Using async inserts (
async_insert = 1) with deduplication enabled, where the batching behavior makes block-level deduplication unreliable - Attempting deduplication on a non-replicated MergeTree table that does not have deduplication infrastructure
- The
insert_deduplicatesetting is enabled but the table engine or insert path does not support it - Using a Buffer table in front of a ReplicatedMergeTree table, where the buffering breaks the deduplication chain
- The insert block is too large or has been split in a way that changes its hash, making deduplication impossible
- Materialized views in the insert pipeline that transform data in ways incompatible with deduplication
Troubleshooting and Resolution Steps
Check the full error message and context in the logs:
grep -i 'DEDUPLICATION_IS_NOT_POSSIBLE' /var/log/clickhouse-server/clickhouse-server.log | tail -20If using async inserts, either disable deduplication or disable async inserts:
-- Option A: Disable async inserts for this session SET async_insert = 0; -- Option B: Disable deduplication for async inserts SET insert_deduplicate = 0; -- Option C: Use async_insert_deduplicate (available in newer versions) SET async_insert = 1; SET async_insert_deduplicate = 1;Verify the table engine supports deduplication:
SELECT database, name, engine FROM system.tables WHERE database = 'my_db' AND name = 'my_table';Block-level deduplication is enabled by default on
ReplicatedMergeTreefamily engines. PlainMergeTreetables support it only whennon_replicated_deduplication_windowis set to a non-zero value (the default is 0, i.e. disabled).Check the current deduplication settings:
SELECT name, value FROM system.settings WHERE name IN ('insert_deduplicate', 'async_insert', 'async_insert_deduplicate');If using materialized views, check whether they interfere with deduplication:
SELECT database, name, as_select FROM system.tables WHERE engine = 'MaterializedView' AND as_select LIKE '%my_table%';As a workaround, disable deduplication if your application handles idempotency at a higher level:
INSERT INTO my_table SETTINGS insert_deduplicate = 0 VALUES (...);
Best Practices
- If you rely on ClickHouse deduplication, use
ReplicatedMergeTreetables and synchronous inserts for the most reliable behavior. - When using async inserts, understand that deduplication behavior differs. In recent ClickHouse versions,
async_insert_deduplicateprovides deduplication support for async inserts. - Avoid placing Buffer tables in front of replicated tables if deduplication is critical to your pipeline.
- Design your application to handle potential duplicates at the query level (using
FINALorargMax) as a defense-in-depth strategy, rather than relying solely on insert deduplication. - Keep insert batches consistent in size and content when relying on deduplication, as the block hash depends on the exact data in the batch.
Frequently Asked Questions
Q: How does ClickHouse deduplication work?
A: ClickHouse computes a hash of each inserted data block and stores it in ZooKeeper (for ReplicatedMergeTree). If an insert with the same block hash is received again, it is silently ignored. This provides exactly-once semantics for retried inserts with identical data.
Q: Can I use deduplication with non-replicated MergeTree tables?
A: Limited deduplication support exists for non-replicated MergeTree tables via the non_replicated_deduplication_window setting. However, it is less robust than the ZooKeeper-based deduplication used by ReplicatedMergeTree.
Q: What happens if I disable deduplication?
A: With insert_deduplicate = 0, ClickHouse will accept all inserts without checking for duplicates. If your application retries a failed insert, you may end up with duplicate rows. You would need to handle deduplication at the query level or in your application logic.