How to Diagnose and Fix MySQL Replication Lag

Q: What is the fastest way to bring a severely lagged replica back in sync?

Stop the replica, take a fresh snapshot (using mysqldump --single-transaction or an XtraBackup physical snapshot), restore it, and re-start replication from the current primary position. Re-applying thousands of seconds of relay log is slower than a fresh restore for replicas that are hours behind.

MySQL replication lag is the delay between a transaction committing on the primary and the same transaction being applied on a replica. It is measured by Seconds_Behind_Source (MySQL 8.0+) or Seconds_Behind_Master (5.7) in SHOW REPLICA STATUS. Non-zero lag means reads from the replica are returning stale data.

Impact

Replication lag breaks the "read from replica to scale reads" pattern: any lag means queries on the replica see a snapshot of the data that is seconds (or hours) behind the primary. Applications that read their own writes, display live counts, or show inventory quantities are directly affected. In high-availability setups, a lagging replica is a worse failover target — promoting it means losing all un-applied relay log events.

Common Causes

Single-threaded replica SQL thread (the default before MySQL 5.7) cannot keep up with a multi-threaded primary
A single large transaction on the primary (bulk update, batch delete, large ALTER TABLE) generates a large relay log event that takes minutes to apply
Write-heavy primary with small transactions that individually apply quickly but arrive faster than a single SQL thread can process
Long-running queries on the replica (analytics, reports, backups) competing for I/O and CPU with the SQL thread
max_allowed_packet on the replica smaller than on the primary (large row events are rejected, replica stops)
Replica hardware is slower than the primary (slower storage IOPS, less RAM → buffer pool pressure)
Replica I/O thread blocked waiting for the primary binlog to roll over (rare but possible with very large binary log files)
innodb_flush_log_at_trx_commit=1 on the replica (synchronous flush) adding write latency per transaction commit
Network latency between primary and replica I/O thread (geographically distributed replicas)
Unsupported or non-deterministic SQL in statement-based replication requiring extra work to apply safely

Troubleshooting and Resolution Steps

Check current lag and replica status:
```
SHOW REPLICA STATUS\G   -- MySQL 8.0+
SHOW SLAVE STATUS\G     -- MySQL 5.7
```
Key fields:
- Seconds_Behind_Source — lag in seconds (NULL if the SQL thread is stopped)
- Relay_Log_Space — size of un-applied relay log data on disk
- Exec_Source_Log_Pos — position in the primary's binlog being applied
- Last_SQL_Error — any error that stopped the SQL thread
- Replica_SQL_Running_State — what the SQL thread is doing right now
Measure lag more accurately with Heartbeat (recommended — Seconds_Behind_Source can be misleading):
```
-- Uses pt-heartbeat (Percona Toolkit) or similar
-- pt-heartbeat inserts a timestamp to a table on the primary every second
-- pt-heartbeat --monitor measures the difference on the replica
```
Seconds_Behind_Source measures the timestamp difference between the currently executing binlog event and now. It can jump to 0 if the replica is idle (not applying anything) even though it is behind, making monitoring unreliable.
Enable parallel replication (the single most impactful fix):
```
-- On the replica — requires stopping the replica first
STOP REPLICA SQL_THREAD;

SET GLOBAL replica_parallel_workers = 8;       -- number of parallel threads
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';  -- MySQL 5.7.22+ / 8.0
SET PERSIST replica_parallel_workers = 8;
SET PERSIST replica_parallel_type = 'LOGICAL_CLOCK';

START REPLICA SQL_THREAD;
```
LOGICAL_CLOCK applies transactions that committed concurrently on the primary in parallel on the replica, which is safe because they did not conflict. This is the correct setting for most workloads. DATABASE parallelism (the older default) only parallelizes across different databases and is rarely useful.

On MySQL 5.7, the variable names are slave_parallel_workers and slave_parallel_type.

Identify which transaction is causing a spike in lag:

-- While lag is spiking, check what the SQL thread is running
SELECT * FROM performance_schema.replication_applier_status_by_worker\G

The APPLYING_TRANSACTION column shows the GTID of the transaction being applied. You can then find the original SQL on the primary:

-- On the primary
SHOW BINLOG EVENTS IN 'mysql-bin.000123' FROM <pos> LIMIT 10;

Find large transactions in the binary log:

mysqlbinlog --no-defaults mysql-bin.000123 | \
  awk '/^# at /{pos=$3} /^SET TIMESTAMP/{ts=$0} /^COMMIT/{print pos, ts}' | \
  sort -k1 -n | tail -20

Or use mysqlbinlog --verbose --base64-output=DECODE-ROWS to read row events.

Reduce replica I/O pressure by relaxing durability settings on the replica:

-- Safe on replicas that are not themselves primary-eligible
SET GLOBAL innodb_flush_log_at_trx_commit = 2;  -- flush every second, not per commit
SET GLOBAL sync_binlog = 0;                      -- let OS buffer binlog writes
SET PERSIST innodb_flush_log_at_trx_commit = 2;
SET PERSIST sync_binlog = 0;

These are not safe on the primary (data loss on crash), but on a replica that can re-sync from the primary if it crashes, the performance gain is significant.

Throttle reads on the replica to reduce competition with the SQL thread:
- Route long-running analytics queries to a dedicated read replica, not the same replica used for application reads
- Set resource group limits or connection-level priorities for BI tool users
- Schedule mysqldump and backups (even --single-transaction) to off-peak hours — the backup's large transaction can delay the SQL thread

Break up large batch writes on the primary:

-- Bad: one huge DELETE that generates a massive binlog event
DELETE FROM events WHERE created_at < '2024-01-01';

-- Better: delete in batches, give the replica time to catch up
DELETE FROM events WHERE created_at < '2024-01-01' LIMIT 5000;
-- Repeat in a loop with sleep between iterations

Tools like pt-archiver (Percona Toolkit) do this automatically.

Verify max_allowed_packet matches on primary and all replicas:
```
-- On primary
SHOW VARIABLES LIKE 'max_allowed_packet';

-- On replica — must be >= primary value
SET PERSIST max_allowed_packet = 536870912;
```
A mismatch causes the replica I/O thread to fail on large row events, stopping replication entirely.

Monitor lag continuously. Alert at two thresholds:

Warning at 10 s — application reads may be visibly stale
Critical at 60 s — failover risk, consider promoting a less-lagged replica

In Prometheus (with mysqld_exporter):

- alert: MySQLReplicationLagWarning
  expr: mysql_slave_status_seconds_behind_master > 10
  for: 2m
  labels:
    severity: warning
- alert: MySQLReplicationLagCritical
  expr: mysql_slave_status_seconds_behind_master > 60
  for: 1m
  labels:
    severity: critical

Consider semi-synchronous replication to bound the loss window rather than the lag window — semi-sync guarantees that at least one replica has received and acknowledged each transaction before the primary commits, eliminating the risk of data loss on primary failure, regardless of lag.

Additional Information

Seconds_Behind_Source is computed as current_timestamp - timestamp_of_last_applied_event. If the replica SQL thread is idle (no new events to apply), it reports 0 even though it may have an accumulated backlog. Use heartbeat-based monitoring for accuracy.
GTID (gtid_mode=ON) makes it easier to re-point replicas to a new primary after failover (CHANGE REPLICATION SOURCE TO SOURCE_AUTO_POSITION=1) and to diagnose which transactions are missing.
Binary log format matters for parallelism: ROW format with LOGICAL_CLOCK parallelism extracts the most concurrency because row-based events are deterministic and their conflict domain (the rows they touch) can be precisely computed. STATEMENT format with LOGICAL_CLOCK is less safe and not recommended.
Replica lag and replica error are different conditions. A stopped replica (SQL thread error) reports Seconds_Behind_Source = NULL, while a lagging replica reports a positive number. NULL is always more urgent.

Frequently Asked Questions

Q: Seconds_Behind_Source oscillates between 0 and a large value. Is that normal? A: It can indicate burst writes on the primary that the replica catches up on during quiet periods. It can also be the symptom of a large transaction that spikes lag for the duration of its application. Use heartbeat monitoring to distinguish.

Q: I enabled parallel replication but lag didn't improve. Why? A: Parallel replication only helps if multiple transactions can run in parallel — that requires concurrent commits on the primary with overlapping last_committed timestamps in the binary log. A primary that does most writes in serialized large transactions sees no parallelism benefit. Also confirm replica_parallel_workers > 1 is actually active: SHOW STATUS LIKE 'Replica_workers_running'.

Q: Can lag ever go negative (replica ahead of primary)? A: No. The formula can only produce non-negative values. However, clock skew between primary and replica servers can cause Seconds_Behind_Source to report 0 when there is actual lag. Ensure both servers use NTP / PTP time synchronization.

Q: Will fixing lag by relaxing innodb_flush_log_at_trx_commit put my data at risk? A: On the replica, yes — you risk losing up to one second of applied transactions on a crash, meaning the replica may need to re-sync from the primary. Since the primary is the source of truth and the data still exists there, this is acceptable for a read replica. Do not change this setting on the primary.

Q: What is the fastest way to bring a severely lagged replica back in sync? A: Stop the replica, take a fresh snapshot (using mysqldump --single-transaction or an XtraBackup physical snapshot), restore it, and re-start replication from the current primary position. Re-applying thousands of seconds of relay log is slower than a fresh restore for replicas that are hours behind.