ClickHouse DB::Exception: Deduplication is not possible

Q: What happens if I disable deduplication?

With insert_deduplicate = 0 , ClickHouse will accept all inserts without checking for duplicates. If your application retries a failed insert, you may end up with duplicate rows. You would need to handle deduplication at the query level or in your application logic.

The "DB::Exception: Deduplication is not possible" error in ClickHouse indicates that the server cannot perform its built-in insert deduplication for the current operation. The DEDUPLICATION_IS_NOT_POSSIBLE error code is raised when the conditions required for ClickHouse's block-level deduplication mechanism are not met in the current context.

Impact

When this error occurs, the INSERT operation will fail rather than risk inserting duplicate data. This is a protective measure, as ClickHouse's deduplication mechanism is designed to ensure exactly-once insert semantics for ReplicatedMergeTree tables. The failure means your data was not written, and the insert must be reconfigured or retried under conditions that support deduplication.

Common Causes

Using async inserts (async_insert = 1) with deduplication enabled, where the batching behavior makes block-level deduplication unreliable
Attempting deduplication on a non-replicated MergeTree table that does not have deduplication infrastructure
The insert_deduplicate setting is enabled but the table engine or insert path does not support it
Using a Buffer table in front of a ReplicatedMergeTree table, where the buffering breaks the deduplication chain
The insert block is too large or has been split in a way that changes its hash, making deduplication impossible
Materialized views in the insert pipeline that transform data in ways incompatible with deduplication

Troubleshooting and Resolution Steps

Check the full error message and context in the logs:

grep -i 'DEDUPLICATION_IS_NOT_POSSIBLE' /var/log/clickhouse-server/clickhouse-server.log | tail -20

If using async inserts, either disable deduplication or disable async inserts:

-- Option A: Disable async inserts for this session
SET async_insert = 0;

-- Option B: Disable deduplication for async inserts
SET insert_deduplicate = 0;

-- Option C: Use async_insert_deduplicate (available in newer versions)
SET async_insert = 1;
SET async_insert_deduplicate = 1;

Verify the table engine supports deduplication:
```
SELECT database, name, engine
FROM system.tables
WHERE database = 'my_db' AND name = 'my_table';
```
Block-level deduplication is enabled by default on ReplicatedMergeTree family engines. Plain MergeTree tables support it only when non_replicated_deduplication_window is set to a non-zero value (the default is 0, i.e. disabled).

Check the current deduplication settings:

SELECT name, value
FROM system.settings
WHERE name IN ('insert_deduplicate', 'async_insert', 'async_insert_deduplicate');

If using materialized views, check whether they interfere with deduplication:

SELECT database, name, as_select
FROM system.tables
WHERE engine = 'MaterializedView' AND as_select LIKE '%my_table%';

As a workaround, disable deduplication if your application handles idempotency at a higher level:
```
INSERT INTO my_table SETTINGS insert_deduplicate = 0
VALUES (...);
```

Best Practices

If you rely on ClickHouse deduplication, use ReplicatedMergeTree tables and synchronous inserts for the most reliable behavior.
When using async inserts, understand that deduplication behavior differs. In recent ClickHouse versions, async_insert_deduplicate provides deduplication support for async inserts.
Avoid placing Buffer tables in front of replicated tables if deduplication is critical to your pipeline.
Design your application to handle potential duplicates at the query level (using FINAL or argMax) as a defense-in-depth strategy, rather than relying solely on insert deduplication.
Keep insert batches consistent in size and content when relying on deduplication, as the block hash depends on the exact data in the batch.

Frequently Asked Questions

Q: How does ClickHouse deduplication work?
A: ClickHouse computes a hash of each inserted data block and stores it in ZooKeeper (for ReplicatedMergeTree). If an insert with the same block hash is received again, it is silently ignored. This provides exactly-once semantics for retried inserts with identical data.

Q: Can I use deduplication with non-replicated MergeTree tables?
A: Limited deduplication support exists for non-replicated MergeTree tables via the non_replicated_deduplication_window setting. However, it is less robust than the ZooKeeper-based deduplication used by ReplicatedMergeTree.

Q: What happens if I disable deduplication?
A: With insert_deduplicate = 0, ClickHouse will accept all inserts without checking for duplicates. If your application retries a failed insert, you may end up with duplicate rows. You would need to handle deduplication at the query level or in your application logic.