MySQL ALTER TABLE is one of the riskiest operations in production database management. Depending on the operation type and MySQL version, it can lock the table for seconds or hours, block all reads and writes, and cause replication lag spikes. Choosing the wrong approach on a large table at peak traffic is one of the most common causes of production incidents.
Impact
A blocking ALTER TABLE acquires a metadata lock (MDL) that prevents all reads and writes against the table for the duration of the rebuild. On a 50 GB table this can take 30–90 minutes. During that window, application threads accumulate behind the MDL — each waiting thread holds a connection, exhausting max_connections. The result is a full site outage even if the underlying DDL would eventually succeed.
Common Causes of Dangerous ALTERs
- Running
ALTER TABLEwithout specifyingALGORITHM=INPLACEorALGORITHM=INSTANT, defaulting toALGORITHM=COPYon older MySQL versions - A long-running query or open transaction holding an MDL when the
ALTERstarts — theALTERitself blocks, then blocks every subsequent query behind it - Using
pt-online-schema-changeorgh-ostwithout understanding replica lag impact - Modifying
AUTO_INCREMENTstart value, column charset, or column type in ways that require a full table rebuild - Adding or dropping a foreign key without ALGORITHM=INPLACE support
- Running migrations during peak traffic instead of off-peak
Understanding MySQL Online DDL
Check which operations support
ALGORITHM=INSTANT(MySQL 8.0 / InnoDB only):-- Instant DDL completes in milliseconds regardless of table size -- It only updates the data dictionary, not the actual row data ALTER TABLE orders ADD COLUMN shipped_at DATETIME NULL, ALGORITHM=INSTANT;Operations supported by INSTANT (as of MySQL 8.0.29+):
- Adding a nullable column at the end of the row format's column list
- Adding / dropping a generated virtual column
- Dropping a column (with limitations before 8.0.29)
- Changing column default value
- Changing
ENUM/SETcolumn values (adding new members at the end) - Renaming a table
Use
ALGORITHM=INPLACEfor operations that rebuild indexes but not the full table:-- INPLACE uses the InnoDB online DDL path: -- Phase 1: build the new index structure in the background -- Phase 2: apply a brief MDL to swap the old and new structures ALTER TABLE orders ADD INDEX idx_status (status), ALGORITHM=INPLACE, LOCK=NONE;LOCK=NONEfails the statement if the operation cannot proceed without blocking reads — a safety check to avoid accidental locks.Know what forces
ALGORITHM=COPY(full table rebuild, unavoidable lock):- Changing a column's data type incompatibly (e.g.,
VARCHAR(100)→VARCHAR(50),INT→BIGINTin MySQL < 8.0) - Adding or removing
FULLTEXTorSPATIALindexes (requires COPY in some versions) - Modifying the row format or character set requiring data conversion
- Adding a
PRIMARY KEYto a table that currently has none (rebuilds the clustered index)
- Changing a column's data type incompatibly (e.g.,
Check the operation's MDL behavior before running in production:
-- Dry run: wrap in a transaction, check, rollback SET SESSION lock_wait_timeout = 1; -- fail after 1 second instead of waiting ALTER TABLE orders ADD INDEX idx_test (status), ALGORITHM=INPLACE, LOCK=NONE; -- If it errors immediately, the operation is incompatible with LOCK=NONE
Blocking ALTER and Metadata Lock Queue
Identify what is holding an MDL against your table before running ALTER:
-- MySQL 8.0+ SELECT OBJECT_SCHEMA, OBJECT_NAME, LOCK_TYPE, LOCK_DURATION, LOCK_STATUS, THREAD_ID, PROCESSLIST_ID FROM performance_schema.metadata_locks JOIN performance_schema.threads USING (THREAD_ID) WHERE OBJECT_NAME = 'orders';Kill long-running transactions before running a blocking ALTER:
-- Find open transactions older than 30 seconds SELECT trx_id, trx_mysql_thread_id AS pid, TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS age_s, LEFT(trx_query, 100) FROM information_schema.INNODB_TRX WHERE TIMESTAMPDIFF(SECOND, trx_started, NOW()) > 30 ORDER BY age_s DESC; -- Kill them KILL 12345;Do this immediately before the ALTER so new long transactions don't start in between.
Set
lock_wait_timeoutfor the ALTER session to abort rather than block indefinitely if an MDL cannot be acquired:SET SESSION lock_wait_timeout = 30; -- seconds ALTER TABLE orders ADD INDEX idx_status (status);If an MDL is not available within 30 seconds, the ALTER errors instead of blocking all subsequent queries behind it.
Online Schema Change Tools
For tables too large for any locking window, use an online schema change (OSC) tool that works by copying data in the background.
pt-online-schema-change(Percona Toolkit):pt-online-schema-change \ --host=localhost \ --user=root \ --password=secret \ --alter="ADD INDEX idx_status (status)" \ --execute \ D=mydb,t=ordersHow it works: creates a shadow table with the new structure, copies rows in chunks with
INSERT INTO ... SELECT, installs triggers to capture concurrent writes, then performs an atomicRENAME TABLEto swap.Key options:
--chunk-size=1000 # rows per chunk (reduce if replica lag spikes) --max-load=Threads_running=25 # pause if load exceeds threshold --critical-load=Threads_running=50 # abort if load exceeds threshold --check-slave-lag=replica_host # pause if replica lag > max-lag --max-lag=1 # max replica lag in secondsgh-ost(GitHub's online schema change):gh-ost \ --host=127.0.0.1 \ --user=root \ --password=secret \ --database=mydb \ --table=orders \ --alter="ADD INDEX idx_status (status)" \ --executeUnlike pt-osc, gh-ost does not use triggers — it tails the binary log directly to capture concurrent changes. This avoids trigger overhead on write-heavy tables and works correctly with tables that have no unique key (which pt-osc cannot handle).
Key options:
--chunk-size=1000 --max-load=Threads_running=25 --throttle-control-replicas=replica1,replica2 --max-lag-millis=1500 # pause if replica lag > 1.5 s --switch-to-rbr # switch session to ROW binlog format if neededMonitor progress during an online schema change:
# pt-osc prints chunk progress; also watch replica lag watch -n5 "mysql -e \"SHOW REPLICA STATUS\G\" | grep Seconds_Behind" # gh-ost writes a progress file tail -f /tmp/gh-ost.mydb.orders.progress
Post-Migration Verification
Verify the new index is being used:
EXPLAIN SELECT * FROM orders WHERE status = 'pending';Check table size and fragmentation after a large rebuild:
SELECT table_name, ROUND(data_length / 1024 / 1024, 1) AS data_mb, ROUND(index_length / 1024 / 1024, 1) AS index_mb, ROUND(data_free / 1024 / 1024, 1) AS fragmented_mb FROM information_schema.tables WHERE table_schema = 'mydb' AND table_name = 'orders';
Additional Information
- MySQL 8.0 supports
ALGORITHM=INSTANTfor column addition anywhere in the row (not just the end) as of 8.0.29, dramatically expanding the set of zero-impact migrations. - Both pt-osc and gh-ost require a
PRIMARY KEYorUNIQUE NOT NULLkey on the table to chunk rows safely. Tables without a unique key cannot be safely migrated with these tools. - Foreign keys are not supported by pt-osc and have limited support in gh-ost. Drop foreign keys before migrating and recreate them after.
- Liquibase, Flyway, and other migration frameworks integrate with gh-ost and pt-osc via their respective plugins, allowing OSC-style migrations from within standard migration pipelines.
- For Aurora MySQL, AWS provides its own online DDL path that is aware of cluster topology. Aurora Fast DDL (
ALGORITHM=INSTANT) covers a wider set of operations than community MySQL.
Frequently Asked Questions
Q: How long will my ALTER take?
A: For ALGORITHM=INSTANT, milliseconds regardless of table size. For ALGORITHM=INPLACE (online index build), roughly proportional to table size and write rate — a 50 GB table might take 10–30 minutes. For ALGORITHM=COPY, similar to INPLACE but with an exclusive lock at the end for the final swap. For gh-ost/pt-osc, the copy phase takes the same time as INPLACE but is throttled to avoid overload, so wall-clock time is longer (hours is common for very large tables).
Q: Can I run multiple ALTERs at once?
A: You can combine multiple operations in a single ALTER TABLE statement — MySQL 8.0 batches compatible operations. Running two separate ALTER statements simultaneously is usually wrong: the second one waits for the first to finish (MDL), doubling total time.
Q: pt-osc is causing replication lag. What do I adjust?
A: Lower --chunk-size and enable --check-slave-lag with --max-lag=1. pt-osc will pause chunking when replica lag exceeds the threshold, letting the replica catch up before continuing.
Q: What happens to data written during the gh-ost copy? A: gh-ost reads the binary log in real time and applies changes to the ghost table as they arrive. When the copy is complete and the ghost table is caught up to the primary, gh-ost performs an atomic lock-rename-unlock that takes < 1 second. Writes during that window are queued and applied immediately after.
Q: My migration framework runs ALTER TABLE in a transaction. Is that safe?
A: DDL in MySQL causes an implicit COMMIT before and after the statement — it cannot be truly transactional. If your migration framework wraps ALTER TABLE in BEGIN/COMMIT, the BEGIN commits any prior DML but the ALTER itself is not rolled back on error. Test your framework's rollback behavior carefully.