ALTER TABLE ... MODIFY COLUMN looks like a quick metadata change, but when the new type cannot be cast from the existing data in every part, ClickHouse turns it into a mutation that rewrites parts in the background. If that mutation fails on even one part, it gets stuck retrying forever — and because the table metadata already advertises the new type while the data on disk still holds the old representation, the column becomes effectively inaccessible: reads against it throw errors. This guide explains why this happens, how to confirm it from system tables, and how to recover safely.
For the full surface of ALTER TABLE semantics, see ClickHouse ALTER TABLE. This page focuses narrowly on the stuck-mutation / inaccessible-column failure mode.
When MODIFY COLUMN Becomes a Mutation
ClickHouse splits a column type change into two cases:
| Change | Semantics |
|---|---|
Cast is lossless and representable in place (Int32 → Int64, widening FixedString) |
Metadata-only. Fast, synchronous, applied immediately. |
Cast requires rewriting stored data (String → LowCardinality(String), String → Enum8, String → Int, narrowing types, changing date/time encodings) |
A mutation is scheduled. It runs asynchronously, rewriting whole data parts, and is tracked in system.mutations. |
The dangerous case is the second one. The metadata flips to the new type immediately, but the actual conversion happens part by part in the background. If the conversion raises an exception on any part, that part can never complete, and queries that read the column see a type that does not match what is physically stored.
The canonical trigger is converting a String column to an Enum whose definition omits a value that exists in the data:
CREATE TABLE modify_column (column_n String)
ENGINE = MergeTree() ORDER BY tuple();
INSERT INTO modify_column VALUES ('key_a'), ('key_b'), ('key_c');
-- 'key_c' is missing from the Enum definition
ALTER TABLE modify_column
MODIFY COLUMN column_n Enum8('key_a' = 1, 'key_b' = 2);
The ALTER statement returns, but the mutation that converts the data fails with something like:
Unknown element 'key_c' for enum: while executing 'FUNCTION CAST(...)'
Now any SELECT column_n FROM modify_column can fail, because the table claims the column is an Enum8('key_a'=1,'key_b'=2) while parts still contain 'key_c'.
Step 1: Confirm the Mutation Is Stuck
Check system.mutations for the table. A stuck mutation has is_done = 0 and a populated latest_fail_reason:
SELECT
mutation_id,
command,
parts_to_do,
is_done,
latest_failed_part,
latest_fail_time,
latest_fail_reason
FROM system.mutations
WHERE table = 'modify_column' AND is_done = 0
FORMAT Vertical;
Key columns to read:
is_done = 0— the mutation has not finished.parts_to_do— how many parts still need rewriting. If this stays constant over time, the mutation is not making progress.latest_fail_reason— the exception that keeps killing the part rewrite. This is the single most useful field; it tells you exactly why the conversion fails.latest_failed_part/latest_fail_time— which part failed and when, useful for confirming the mutation is actively retrying rather than waiting on resources.
If latest_fail_reason is empty but parts_to_do is non-zero and not shrinking, the mutation is likely blocked rather than failing — for example by background pool saturation or an earlier mutation in the queue. That is a different problem; see the impact of mutations and merge stalls on stuck replicated tables.
On a replicated table, check every replica — each one tracks mutation progress locally:
SELECT hostName(), mutation_id, is_done, parts_to_do, latest_fail_reason
FROM clusterAllReplicas('{cluster}', system.mutations)
WHERE table = 'modify_column' AND is_done = 0
FORMAT Vertical;
Step 2: Kill the Stuck Mutation
A failing MODIFY COLUMN mutation will not recover on its own — it retries the same impossible conversion indefinitely. Cancel it with KILL MUTATION, using the mutation_id you found above:
KILL MUTATION
WHERE database = 'default'
AND table = 'modify_column'
AND mutation_id = 'mutation_4.txt';
Notes:
- Parts that were already rewritten stay rewritten; remaining parts are skipped. Because this particular mutation failed on the first part it touched, usually nothing was converted, so the table is consistent once the metadata is reverted (Step 3).
- On replicated tables, killing the mutation entry clears it from the replication queue so it stops blocking subsequent merges and mutations. A long-stuck mutation at the head of the queue can stall all background work on the table, so this also unblocks normal merges.
- A killed mutation can briefly show
is_killed = 1withis_done = 0if another long-running mutation is still ahead of it. Re-checksystem.mutationsuntil the entry disappears. (Theis_killedcolumn is only available in ClickHouse Cloud; it does not exist in self-hosted deployments.)
Step 3: Revert the Metadata, Then Apply the Correct Change
After killing the mutation, the table metadata may still reference the new (unreachable) type. Put the column back to its original type so the data becomes readable again:
ALTER TABLE modify_column
MODIFY COLUMN column_n String;
Then apply the conversion correctly. For the Enum case, include every value present in the data:
ALTER TABLE modify_column
MODIFY COLUMN column_n Enum8('key_a' = 1, 'key_b' = 2, 'key_c' = 3);
To enumerate the distinct values that must appear in the Enum definition before you run the ALTER:
SELECT column_n, count() AS n
FROM modify_column
GROUP BY column_n
ORDER BY n DESC;
Why the Column Becomes Inaccessible
The inaccessibility is a direct consequence of how the change is staged:
MODIFY COLUMNupdates table metadata immediately to the new type.- The physical conversion runs as a background mutation, part by part.
- When the conversion fails, parts retain the old physical representation.
- A
SELECTof that column asks ClickHouse to interpret old bytes as the new type — which fails for the rows that cannot be represented (e.g. the unknown Enum element).
This is why reverting the metadata in Step 3 instantly restores read access: it realigns the declared type with what is physically stored. You are not losing data; the bytes were never rewritten.
Beyond Enums: Other Conversions That Get Stuck
The Enum example is the most common, but any conversion that can throw at cast time behaves the same way:
String→ numeric (Int*,Float*,Decimal) when some rows are not parseable numbers.- Narrowing integer types (
Int64→Int32) when values overflow the target range. String→Date/DateTimewhen some values are not valid dates.Nullable(T)→Twhen NULLs exist in the data.
In every case the recovery pattern is identical: read latest_fail_reason, KILL MUTATION, revert the type, clean or pre-validate the data, then re-apply. For overflow and parse failures, the safer migration is usually the add-new-column / backfill / drop-old pattern described in ALTER TABLE, where the backfill expression handles the bad rows explicitly (toInt32OrNull, toDateOrDefault, etc.).
Best Practices
Validate the target type against the data first. Before converting
StringtoEnum, run aGROUP BYto get the complete value set. Before narrowing an integer, checkmin()/max(). A two-second query prevents an hours-long stuck mutation.Use a synchronous mode on small tables to surface failures immediately. Setting
mutations_sync = 1(wait on the current replica) or2(wait on all replicas) makes theALTERblock until the mutation finishes or fails, so a bad conversion errors out at the prompt instead of silently sticking in the background:SET mutations_sync = 1; ALTER TABLE modify_column MODIFY COLUMN column_n Enum8('key_a'=1,'key_b'=2,'key_c'=3);On large production tables, prefer running it async and watching
system.mutations, since synchronous waits can hold a session open for a long time.Prefer
*OrNull/*OrDefaultcasts via a new column for risky conversions. When the source data is dirty, add a new typed column, backfill it with a forgiving expression, verify, then drop the old column. This never leaves a column inaccessible.Watch the replication queue on replicated tables. A stuck MODIFY COLUMN mutation sits in the queue and can block merges and later mutations across replicas. See the replication queue guide.
Monitor mutation accumulation. Many stuck or pending mutations eventually trip
TOO_MANY_MUTATIONS. See too many mutations.
Common Issues
ALTER of column is forbidden— you are trying to MODIFY a column that is part of the primary, sort, partition, or sampling key. This is a different error; see ALTER of column is forbidden.Mutation stuck but
latest_fail_reasonis empty — the mutation is blocked, not failing. Common causes are background merge pool saturation, disk pressure, or an earlier mutation ahead of it in the queue. Investigate merge load via background merges using high CPU/IO and memory pressure via memory usage diagnosis.KILL MUTATIONreturns but the entry persists — another long-running mutation is blocking finalization. The killed entry showsis_killed = 1,is_done = 0until the blocker clears. Re-querysystem.mutations. (Theis_killedcolumn is only available in ClickHouse Cloud; self-hosted deployments do not expose this field.)Column still errors after killing — you have not reverted the metadata yet. Run the
MODIFY COLUMNback to the original type (Step 3); read access returns immediately because no data was rewritten.
How Pulse Helps
A failed MODIFY COLUMN mutation is dangerous precisely because it is silent: the ALTER statement returns successfully, and the breakage only surfaces when someone queries the column or when the replication queue backs up behind the stuck entry. Pulse continuously watches system.mutations across your ClickHouse cluster and flags mutations that stop making progress, mutations carrying a non-empty latest_fail_reason, and entries that are blocking the replication queue — before they turn into an inaccessible column or a cluster-wide merge stall. It correlates the stuck mutation with the originating DDL and the affected table so you know exactly which ALTER to revert. Connect your ClickHouse cluster to Pulse to catch stuck schema changes the moment they fail rather than when a dashboard breaks.
Frequently Asked Questions
Q: Why did my MODIFY COLUMN succeed instantly but the column is now broken?
The ALTER only updated metadata; the data conversion runs as a background mutation. If that mutation failed (check system.mutations for is_done = 0 and latest_fail_reason), the declared type no longer matches the bytes on disk, so reads of the column throw.
Q: Will killing the mutation lose data?
No. KILL MUTATION stops the conversion; parts already rewritten stay rewritten and the rest keep their original representation. For a conversion that fails on the first part (the common Enum case), nothing was rewritten, so reverting the type in metadata fully restores the table.
Q: How do I find the exact reason a MODIFY COLUMN mutation is stuck?
Read the latest_fail_reason column in system.mutations for that table. It contains the exception thrown during the part rewrite — for example, Unknown element '...' for enum or a numeric parse/overflow error.
Q: How do I avoid this when converting String to Enum?
Run SELECT DISTINCT column FROM table (or a GROUP BY) first and include every value in the Enum definition. A missing value causes the conversion to fail on the parts that contain it.
Q: Is a stuck MODIFY COLUMN mutation affecting other operations?
On replicated tables, yes. A failing mutation sits in the replication queue and can block subsequent merges and mutations on the same table. Killing it clears the queue. See the replication queue guide.
Q: Can I make the ALTER fail loudly instead of silently sticking?
On small tables, SET mutations_sync = 1 before the ALTER so the statement waits for the mutation and raises the conversion error directly. On large tables this is impractical; monitor system.mutations instead.