OPTIMIZE TABLE and OPTIMIZE TABLE ... FINAL look almost identical but do very different amounts of work. Plain OPTIMIZE triggers a single, opportunistic merge pass; OPTIMIZE FINAL forces every part in the target down to one part per partition, rewriting data that is already merged. Choosing the wrong one either leaves you with parts you expected to be gone, or saturates CPU and disk for hours on a large table.
This guide compares the two, explains the deduplication and collapsing semantics that depend on FINAL, lists which engines support each, and gives concrete rules for when to reach for which.
The Core Difference
Both forms issue an unscheduled merge — they ask ClickHouse to merge now rather than wait for the background scheduler. The difference is how far the merge goes.
OPTIMIZE TABLE tperforms a single merge cycle, the same kind the background scheduler would run. It picks parts that are beneficial to merge and combines them. It does not guarantee one part per partition. If the merge selector decides nothing is worth merging, it does nothing — silently, by default.OPTIMIZE TABLE t FINALforces optimization "even when all the data is already in one part." It repeatedly merges until each partition is reduced to a single part (subject to part-size limits), rewriting already-merged data in the process.
-- One opportunistic merge pass; may leave many parts
OPTIMIZE TABLE events;
-- Force every partition down to one part, rewriting everything
OPTIMIZE TABLE events FINAL;
A useful mental model: with 40 parts spread across 3 partitions, plain OPTIMIZE might merge a handful of small parts into one, leaving ~38 parts. OPTIMIZE ... FINAL keeps merging until you have one part per partition — here, 3 parts total — at the cost of reading and rewriting all the data.
Comparison Table
| Aspect | OPTIMIZE TABLE |
OPTIMIZE TABLE ... FINAL |
|---|---|---|
| Work performed | A single beneficial merge pass | Repeated merges down to one part per partition |
| Parts left per partition | Indeterminate — whatever the selector decides | One (if part size permits) |
| Rewrites already-merged data? | No | Yes |
| Forces deduplication / collapsing? | Only incidentally, if those parts happen to merge | Yes — guarantees a final merge that applies engine logic |
| No-op when nothing to merge? | Yes (silent unless optimize_throw_if_noop) |
No — always does work |
| CPU / disk cost | Low, bounded by one merge | High; scales with total table/partition size |
| Typical use | Nudge merges along; reduce a few small parts | One-time consolidation or forced dedup |
What FINAL Actually Guarantees
The reason FINAL matters goes beyond part count. ClickHouse's special MergeTree engines only apply their data-transformation logic during merges:
- ReplacingMergeTree keeps only the latest row per sorting key — but only once the duplicate rows land in the same merged part.
- CollapsingMergeTree / VersionedCollapsingMergeTree cancel out matching +1/-1 sign rows during merge.
- SummingMergeTree and AggregatingMergeTree collapse rows into summed/aggregated rows during merge.
Until a merge brings the relevant rows together, duplicates and uncollapsed rows remain on disk. Plain OPTIMIZE may or may not merge those specific parts. OPTIMIZE ... FINAL forces a complete merge per partition, which is the only way (short of a SELECT ... FINAL read-time modifier) to guarantee deduplication or collapsing is materialized in storage.
-- ReplacingMergeTree: force latest-row-wins to be applied on disk
OPTIMIZE TABLE current_state FINAL;
Note the per-partition boundary: a merge reads rows in storage order (the ORDER BY of the table), and for ReplacingMergeTree the last row per sorting key survives — specifically the most recently inserted row, or the row with the highest value of the optional ver column. Because FINAL merges all parts within a partition, deduplication is applied across all rows sharing a sorting key within that partition — duplicates that span partitions are not deduplicated against each other.
DEDUPLICATE is a separate thing
Do not confuse engine-driven dedup with the explicit DEDUPLICATE clause:
-- Remove rows that are completely identical across all columns
OPTIMIZE TABLE logs FINAL DEDUPLICATE;
-- Deduplicate by a subset of columns
OPTIMIZE TABLE logs FINAL DEDUPLICATE BY event_id, user_id;
DEDUPLICATE compares full rows (or the columns named in BY) regardless of table engine, and keeps the first unique row in storage order. It is independent of ReplacingMergeTree semantics and is an expensive, manual cleanup tool — not something to run routinely.
Supported Engines
OPTIMIZE (with or without FINAL) works on the MergeTree family (including the materialized-view tables backing them) and the Buffer engine. Other engines are not supported.
FINAL's deduplication/collapsing guarantees are only meaningful for engines that have merge-time logic — ReplacingMergeTree, the Collapsing/VersionedCollapsing engines, SummingMergeTree, and AggregatingMergeTree. On a plain MergeTree table, FINAL simply consolidates parts. See /kb/clickhouse-mergetree for the engine family overview.
Performance and Cost
Plain OPTIMIZE is cheap and bounded: it is one merge, comparable to what the background scheduler does anyway.
OPTIMIZE ... FINAL is the opposite. It rewrites entire partitions, including data that is already in a single part, so its cost scales with total data size, not with how many parts are "out of shape." The Altinity Knowledge Base advises against running OPTIMIZE TABLE ... FINAL on tables with more than ~10 million rows, noting it can create heavy CPU and disk load and take hours on large tables (on the order of a few hours for ~1 TB).
To keep the blast radius small, scope to a single partition whenever possible:
-- Far cheaper than optimizing the whole table
OPTIMIZE TABLE events PARTITION '2026-06' FINAL;
OPTIMIZE ... FINAL runs as a regular merge on the server. If the client connection drops or times out, the merge keeps running server-side. For timeout handling and how to monitor an in-flight operation, see /kb/clickhouse-optimize-final-timeout.
Useful Settings
A few settings change OPTIMIZE behavior:
optimize_throw_if_noop(default0): plainOPTIMIZEis silent when no merge happens. Set to1to get an exception instead of guessing whether anything occurred. Covered in /kb/clickhouse-optimize-throw-if-noop-setting.optimize_skip_merged_partitions: withFINAL, skip partitions that are already a single part and have no expired TTL, avoiding pointless rewrites.alter_sync/mutations_sync: control whether the statement waits for replicas to finish on Replicated tables.distributed_ddl_task_timeout: bounds how long the client waits when runningON CLUSTER.
-- Don't silently do nothing; tell me if no merge ran
OPTIMIZE TABLE events SETTINGS optimize_throw_if_noop = 1;
When to Use Each
Reach for plain OPTIMIZE when:
- You want to nudge the scheduler to merge a few small parts right now.
- You are testing merge behavior or part-count reduction interactively.
- You explicitly do not want to rewrite already-merged data.
Reach for OPTIMIZE ... FINAL when:
- You need ReplacingMergeTree/Collapsing/Summing logic materialized on disk before a downstream read that can't use
SELECT ... FINAL. - You are doing a one-time, scheduled maintenance consolidation on a bounded partition.
- You are removing exact-duplicate rows with
DEDUPLICATE.
Avoid both when the real problem is that background merges aren't keeping up (rising part counts, "too many parts" errors). OPTIMIZE FINAL is not a fix for that — it competes for the same merge resources and often makes the backlog worse. Tune the merge subsystem instead; see /kb/clickhouse-aggressive-merges-tuning.
Common Issues
- Expecting plain
OPTIMIZEto leave one part. It won't. OnlyFINALguarantees one part per partition. - Running
OPTIMIZE FINALon a whole large table. Scope to a partition, or you risk hours of CPU/disk load. Don't run it on tables over ~10M rows without a clear reason. - Using
OPTIMIZE FINALto guarantee global deduplication. It only deduplicates within a partition. Rows duplicated across partitions survive. - Treating
OPTIMIZE FINALas a routine cron job. It rewrites data every run. For query-time correctness, preferSELECT ... FINALor aFINAL-aware query pattern instead of permanently rewriting storage. - Confusing the statement with the
FINALmodifier.OPTIMIZE TABLE ... FINALrewrites storage once;SELECT ... FINALapplies merge logic per query without touching disk.
How Pulse Helps
Deciding whether a partition needs OPTIMIZE FINAL — or whether your background merges are simply falling behind — requires visibility into system.merges, part counts per partition, and merge throughput over time. Pulse monitors ClickHouse merge activity and part growth, flags partitions accumulating unmerged duplicates in ReplacingMergeTree/Collapsing tables, and surfaces when an OPTIMIZE FINAL is saturating I/O or competing with the scheduler. That makes it straightforward to tell the difference between "this needs a one-time consolidation" and "merge tuning is the actual fix," instead of reaching for OPTIMIZE FINAL as a blunt instrument.
Frequently Asked Questions
Q: Does plain OPTIMIZE deduplicate ReplacingMergeTree rows?
A: Only incidentally. Deduplication happens when the duplicate rows end up in the same merged part. Plain OPTIMIZE runs one merge pass and may not touch the relevant parts. Use OPTIMIZE ... FINAL to force it within each partition.
Q: Will OPTIMIZE FINAL always reduce a partition to one part?
A: Usually, but not always. Unlike automatic background merges, OPTIMIZE FINAL ignores max_bytes_to_merge_at_max_space_in_pool — only available free disk space constrains it. However, if there is insufficient free disk space to merge all parts into one, the partition may remain as more than one part. On very large partitions this can also result in an extremely large single part, memory pressure, or a very long merge.
Q: Why did my plain OPTIMIZE do nothing and return no error?
A: By default OPTIMIZE is a silent no-op when the selector finds nothing worth merging. Run it with SETTINGS optimize_throw_if_noop = 1 to get an exception instead.
Q: Is OPTIMIZE FINAL the same as SELECT ... FINAL?
A: No. OPTIMIZE TABLE ... FINAL is a DDL statement that permanently rewrites and consolidates data on disk. SELECT ... FINAL is a read-time modifier that applies merge logic for that one query without changing storage. Use the SELECT modifier for correctness on every read; use the OPTIMIZE statement for one-off consolidation.
Q: Can OPTIMIZE FINAL deduplicate rows that span different partitions? A: No. Merges operate within a single partition, so the engine only deduplicates or collapses rows that share a partition. Rows duplicated across partitions are not compared against each other.
Q: Should I run OPTIMIZE FINAL to fix "too many parts" errors? A: No. It competes for merge resources and often worsens the backlog. Tune the merge subsystem instead — see /kb/clickhouse-aggressive-merges-tuning.