A schema migration is a versioned, repeatable change to a database's structure - tables, columns, indexes, constraints, and the data that has to move when those change. The hard part is rarely the SQL itself. It is doing the change without losing data, without holding a lock that stalls production traffic, and in a way you can roll forward or back deterministically. Most migration failures trace back to four problems: irreversible data loss from type changes, long-held locks, non-idempotent scripts that cannot be safely re-run, and a missing rollback path. This guide covers each failure mode and the pattern that fixes it.
The Main Schema Migration Challenges
The table below maps each common failure to its cause and the mitigation covered in this article.
| Challenge | What causes it | Mitigation |
|---|---|---|
| Data loss / truncation | Narrowing a type, e.g. DECIMAL(10,2) to DECIMAL(5,2) |
Validate ranges first; widen, never narrow, in place |
| Long-held locks / downtime | ALTER TABLE taking ACCESS EXCLUSIVE; volatile-default column adds |
Use metadata-only operations; CREATE INDEX CONCURRENTLY |
| Non-idempotent scripts | Re-running a partially applied migration fails | IF NOT EXISTS / IF EXISTS; one logical change per migration |
| Wrong object order | Creating a child table before its parent | Order dependencies; create referenced tables first |
| No rollback path | Destructive change with no reverse | Expand-contract; compensation migrations; backups |
| Unanticipated performance impact | A migration that triggers a full table rewrite | Test on production-sized data; monitor lock and query metrics |
Data Loss From Narrowing Types
Changing a column to a smaller type can silently or fatally lose data. Narrowing a numeric precision is the classic case:
-- Reduces precision from (10,2) to (5,2): values >= 1000.00 no longer fit
ALTER TABLE products
ALTER COLUMN price TYPE DECIMAL(5, 2);
In PostgreSQL this raises numeric field overflow for any row that exceeds the new precision, aborting the migration. Other engines and other type changes (VARCHAR(255) to VARCHAR(50), BIGINT to INT) can truncate instead of erroring. Before any narrowing change, check the live data range:
-- Confirm nothing would overflow the target precision
SELECT count(*) FROM products WHERE price >= 1000.00;
Widening (INT to BIGINT, VARCHAR(50) to VARCHAR(255)) is the safe direction. When you genuinely must narrow, do it as a multi-step expand-contract change: add the new column, backfill validated values, switch reads, then drop the old column.
Locking and Unanticipated Performance Impact
By default, ALTER TABLE acquires an ACCESS EXCLUSIVE lock - the strongest lock level, which blocks reads and writes for the duration (PostgreSQL ALTER TABLE docs). On a small table this is microseconds. On a large table, if the operation rewrites the table, that lock is held for the entire rewrite and every query queues behind it.
What triggers a rewrite matters more than the lock itself. Since PostgreSQL 11, adding a column with a constant (non-volatile) default is metadata-only - the value is stored in the catalog and applied to existing rows on read, so the statement is fast even on huge tables:
-- Metadata-only since PostgreSQL 11: no table rewrite
ALTER TABLE employees ADD COLUMN status TEXT DEFAULT 'active';
A volatile default still forces a full rewrite, because every existing row needs a distinct evaluated value:
-- Volatile default rewrites the entire table and its indexes
ALTER TABLE employees ADD COLUMN created_at timestamptz DEFAULT clock_timestamp();
Changing a column type normally rewrites the table too, unless the old type is binary-coercible to the new one. Index creation has the same trap: a plain CREATE INDEX locks the table against writes for the whole build. Use CONCURRENTLY to keep writes flowing, at the cost of two table scans and a longer build:
-- Does not block writes; cannot run inside a transaction block
CREATE INDEX CONCURRENTLY idx_orders_status ON orders (status);
CREATE INDEX CONCURRENTLY cannot run inside a transaction block, and if it fails mid-build it leaves an invalid index you must drop and recreate. Many migration tools wrap each migration in a transaction by default, so concurrent index builds usually need a tool-specific flag to run non-transactionally.
Non-Idempotent and Non-Retryable Migrations
A migration that has been partially applied - the process died, the connection dropped, the deploy was retried - must be safe to run again. A bare CREATE TABLE or ADD COLUMN fails on the second run because the object already exists, leaving the migration stuck half-applied with no clean path forward. Guard every create and drop:
-- Re-runnable: succeeds whether or not the index already exists
CREATE INDEX IF NOT EXISTS idx_orders_status ON orders (status);
DROP TABLE IF EXISTS legacy_sessions;
Keep each migration to one logical change. A migration that creates two tables and backfills data has three places to fail and no way to know which step already ran. Smaller migrations are easier to make idempotent and easier to reason about on retry. Note that IF NOT EXISTS on CREATE INDEX still requires you to name the index.
Foreign Key Ordering
A referenced table must exist before the table that references it. This script fails because products points at categories, which has not been created yet:
-- Fails: categories does not exist when the FOREIGN KEY is declared
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
category_id INT,
FOREIGN KEY (category_id) REFERENCES categories(category_id)
);
Order the statements so the referenced table comes first:
CREATE TABLE categories (
category_id INT PRIMARY KEY,
category_name VARCHAR(50)
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(50),
category_id INT,
FOREIGN KEY (category_id) REFERENCES categories(category_id)
);
The same dependency rule governs teardown in reverse: drop the referencing table, or the constraint, before the referenced table.
Choosing a Migration Tool
Tooling does not remove the failure modes above, but it gives you versioning, ordering, and an audit of what ran. The tools differ most in how they handle rollback and what language they express changes in.
| Tool | Change format | Rollback support | Ecosystem |
|---|---|---|---|
| Flyway | Versioned SQL (V), repeatable (R), undo (U) |
Undo is a paid/Teams feature; Community has none | JVM, CLI, broad DB support |
| Liquibase | XML / YAML / JSON / SQL changelogs | Auto-rollback for many modeled change types; SQL changelogs need custom rollback | JVM, CLI |
| Alembic | Python scripts with upgrade/downgrade |
Explicit downgrade() you write |
Python / SQLAlchemy |
| Django migrations | Python migration files | Reversible operations where defined | Django ORM |
| Rails Active Record | Ruby migrations (change/up/down) |
Reversible for known operations | Ruby on Rails |
| Sqitch | SQL with deploy/revert/verify scripts | Explicit revert scripts; dependency-ordered, no version numbers | DB-agnostic, Perl |
| Atlas | Declarative HCL or SQL | Computes plans from desired state | Go, declarative |
Two facts catch teams off guard. First, Flyway's Community (free) edition has no undo command - undo migrations with the U prefix are a paid feature, so on Community you reverse a change by writing a new forward "compensation" migration (Flyway undo docs). Second, Liquibase auto-generates rollback for many change types in modeled XML/YAML/JSON changelogs, but formatted-SQL changelogs require you to write the rollback block yourself, and destructive changes like dropTable need custom rollback regardless of format (Liquibase rollback docs).
Zero-Downtime Migrations: Expand-Contract
For changes that cannot tolerate downtime, the expand-contract (parallel change) pattern keeps old and new schema valid at the same time, so application code that expects either version keeps working through the deploy. Renaming a column - which a naive ALTER TABLE ... RENAME breaks instantly for any running old code - becomes a sequence:
- Expand. Add the new column. Backfill it in batches. Dual-write: application writes go to both old and new columns.
- Migrate reads. Deploy code that reads from the new column. Both columns still exist and stay in sync.
- Contract. Once no code references the old column, drop it in a later migration.
Each step is independently deployable and reversible, and no step holds a long lock or requires a coordinated big-bang cutover. The same shape covers type changes, table splits, and not-null additions. Add a nullable column, backfill, validate, then add the constraint as a separate step with NOT VALID followed by VALIDATE CONSTRAINT to avoid a long table scan under lock.
Backward-compatible migrations are the foundation: every intermediate state has to be readable and writable by both the currently-deployed code and the next version. That constraint is what makes rolling deploys and instant rollbacks possible.
The performance side of this is hard to catch by reading SQL alone - a migration that looks trivial can trigger a full table rewrite or sit on an ACCESS EXCLUSIVE lock while queries pile up behind it. Pulse is an AI-native database optimization and maintenance platform that monitors query and schema performance, runs automated root-cause analysis, and flags risky migrations - for example a column add that forces a rewrite, or a CREATE INDEX holding a lock - before they turn into an incident, and surfaces the performance regression if one slips through.
Frequently Asked Questions
Q: What is a database schema migration?
A: A schema migration is a versioned change to a database's structure - adding or altering tables, columns, indexes, and constraints, plus any data movement those changes require. Migrations are tracked and applied in order so every environment converges on the same schema, and so changes are auditable and repeatable.
Q: How do I run a schema migration with zero downtime?
A: Use the expand-contract (parallel change) pattern: add new schema alongside the old, backfill and dual-write, switch reads to the new schema, then drop the old in a later step. Keep every intermediate state backward-compatible so old and new application code both work, and avoid operations that hold an ACCESS EXCLUSIVE lock on large tables.
Q: Why does adding a column with a default lock my PostgreSQL table?
A: Since PostgreSQL 11, adding a column with a constant (non-volatile) default is metadata-only and does not rewrite the table. A volatile default like clock_timestamp(), a generated column, or an identity column forces a full table rewrite under an ACCESS EXCLUSIVE lock, which blocks all access for the duration on large tables.
Q: What is the difference between Flyway and Liquibase for rollback?
A: Flyway's undo (rollback) is a paid feature - the free Community edition has none, so you reverse changes with new forward migrations. Liquibase auto-generates rollback for many change types in modeled XML/YAML/JSON changelogs, but SQL-format changelogs and destructive operations like dropTable require you to write custom rollback statements.
Q: How do I make a migration idempotent?
A: Guard every object creation and removal with CREATE TABLE IF NOT EXISTS, CREATE INDEX IF NOT EXISTS, and DROP ... IF EXISTS, and keep each migration to one logical change. That way a re-run after a partial failure succeeds instead of erroring on an object that already exists.
Q: How can I add an index in PostgreSQL without blocking writes?
A: Use CREATE INDEX CONCURRENTLY, which builds the index without taking a write-blocking lock. It performs two table scans and so takes longer, cannot run inside a transaction block, and leaves an invalid index if it fails mid-build - which you then drop and recreate.
Q: How do I avoid data loss when changing a column type?
A: Only widen types in place (INT to BIGINT, longer VARCHAR); never narrow a type or precision directly, since values that no longer fit either error out or truncate. To narrow, use expand-contract: add a new column, backfill validated values, switch reads, then drop the old column.
Related Reading
- PostgreSQL CREATE INDEX: syntax,
CONCURRENTLY, andIF NOT EXISTSfor non-blocking index builds. - PostgreSQL Lock Timeout: cap how long a migration waits on
ACCESS EXCLUSIVElocks. - PostgreSQL Statement Timeout: bound the runtime of long-running migration statements.
- PostgreSQL Numeric Field Overflow: the error a narrowing precision change raises.
- PostgreSQL CREATE TABLE: table and foreign-key definition, including
IF NOT EXISTS. - PostgreSQL violates foreign key constraint: diagnosing FK ordering and reference failures.
- ClickHouse Schema Migration Tools: comparable tooling options for ClickHouse.
- Database Monitoring Best Practices: catching migration-induced locks and regressions in production.