How to Diagnose and Prevent MySQL OOM Kills

Q: How do I monitor for impending OOM conditions before they cause a kill?

Track node_memory_MemAvailable_bytes (Prometheus node_exporter) and alert when it drops below 10% of total RAM. Also monitor mysqld RSS via /proc/ /status and alert when the ratio approaches your computed memory ceiling.

A MySQL OOM kill occurs when the Linux out-of-memory killer (oom_killer) terminates the mysqld process because the system has run out of memory and swap. The database disappears without a clean shutdown — in-flight transactions are lost, connections are forcibly closed, and InnoDB must run crash recovery on restart.

Impact

An OOM kill produces the same external symptoms as a server crash: all connections drop simultaneously, applications get ERROR 2006: MySQL server has gone away, and the database is unavailable until mysqld restarts and completes InnoDB crash recovery. Depending on data volume and the extent of the crash, recovery can take seconds to many minutes. If binary logging is enabled, some replicas may need to be re-synced.

Common Causes

innodb_buffer_pool_size set to a value that, combined with per-thread buffers, connection count, and OS overhead, exceeds available RAM
A sudden spike in concurrent connections where each thread allocates its per-thread buffers (sort_buffer_size, join_buffer_size, read_buffer_size, read_rnd_buffer_size, binlog_cache_size) simultaneously
Memory leak in a MySQL plugin or user-defined function (UDF)
Swap disabled (common in containerized environments and cloud VMs) — OOM kill fires at the first moment RAM is exhausted, with no swap buffer
Other processes on the same host (application server, monitoring agent, log aggregator) consuming more RAM than anticipated
performance_schema memory tables growing unexpectedly under high query diversity
A large temporary table built in memory (tmp_table_size) pushing over the limit
innodb_sort_buffer_size or myisam_sort_buffer_size used during a large ALTER TABLE or REPAIR TABLE
Group Replication or Galera Cluster's certification database growing during high-write bursts

Troubleshooting and Resolution Steps

Confirm the OOM kill happened by checking the system journal and kernel logs:

journalctl -k --since "1 hour ago" | grep -i "oom\|killed process\|out of memory"
dmesg | grep -i "oom\|killed\|mysql"
grep -i "oom\|killed" /var/log/syslog

You should see a line like Out of memory: Killed process 12345 (mysqld).

Check the MySQL error log for crash recovery messages (indicates an OOM kill, not a graceful restart):

grep -i "innodb: starting crash recovery\|ready for connections\|aborting" \
  /var/log/mysql/error.log | tail -20

Measure current memory consumption:

-- InnoDB buffer pool usage
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_bytes%';

-- Per-thread buffer allocation (performance_schema)
SELECT
    SUM(CURRENT_NUMBER_OF_BYTES_USED) / 1024 / 1024 AS current_mb
FROM performance_schema.memory_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'memory/sql/%';

-- Total memory used by MySQL (Linux)
-- (run from shell, not SQL)
-- cat /proc/$(pgrep mysqld)/status | grep VmRSS

Calculate the theoretical memory ceiling before tuning:

max_memory ≈ innodb_buffer_pool_size
            + innodb_log_buffer_size
            + key_buffer_size (MyISAM)
            + max_connections × (
                sort_buffer_size +
                read_buffer_size +
                read_rnd_buffer_size +
                join_buffer_size +
                binlog_cache_size +
                thread_stack
              )
            + tmp_table_size × (concurrent_complex_queries)
            + OS overhead (~512 MB – 2 GB)

Check current values:

SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
SHOW VARIABLES LIKE 'sort_buffer_size';
SHOW VARIABLES LIKE 'read_buffer_size';
SHOW VARIABLES LIKE 'read_rnd_buffer_size';
SHOW VARIABLES LIKE 'join_buffer_size';
SHOW VARIABLES LIKE 'binlog_cache_size';
SHOW VARIABLES LIKE 'thread_stack';
SHOW VARIABLES LIKE 'max_connections';

Right-size innodb_buffer_pool_size. The standard recommendation is 70–80% of available RAM for a dedicated MySQL server:
```
-- On a 16 GB dedicated MySQL host
SET GLOBAL innodb_buffer_pool_size = 10737418240;  -- 10 GB (67%)
SET PERSIST innodb_buffer_pool_size = 10737418240;
```
On a shared host (running app server + MySQL), use 50% or less and measure actual RSS.

Reduce per-thread buffer defaults — the defaults are surprisingly high and multiply by max_connections:

-- Lower defaults; queries that need more can set session-level
SET GLOBAL sort_buffer_size = 262144;        -- 256 KB (default: 256 KB – 1 MB)
SET GLOBAL read_buffer_size = 131072;        -- 128 KB (default: 128 KB)
SET GLOBAL read_rnd_buffer_size = 262144;   -- 256 KB (default: 256 KB)
SET GLOBAL join_buffer_size = 262144;       -- 256 KB (default: 256 KB)

SET PERSIST sort_buffer_size = 262144;
SET PERSIST read_buffer_size = 131072;
SET PERSIST read_rnd_buffer_size = 262144;
SET PERSIST join_buffer_size = 262144;

Individual sessions that run large sorts can raise their own sort_buffer_size at the session level without affecting other connections.

Enable swap as a safety valve. On Linux, swap prevents hard OOM kills by giving the OS room to move cold pages to disk. A mysqld that is swapping is degraded but alive; an OOM-killed mysqld is down:
```
# Add a 4 GB swap file
fallocate -l 4G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab
```

Configure Linux OOM score to protect mysqld. Lower OOM score = harder to kill:

# Find mysqld PID
MYSQL_PID=$(pgrep mysqld | head -1)

# Set a negative adjustment (-16 = less likely to be killed)
echo -16 > /proc/$MYSQL_PID/oom_score_adj

# Make it persistent via systemd
# In /etc/systemd/system/mysql.service.d/oom.conf:
# [Service]
# OOMScoreAdjust=-500

Reduce max_connections to bound per-thread memory. If the server cannot afford max_connections × per-thread-buffers, lower max_connections and add ProxySQL or pgbouncer-equivalent connection pooling in front:
```
SET GLOBAL max_connections = 200;
SET PERSIST max_connections = 200;
```

Profile actual memory usage with performance_schema:

SELECT
    EVENT_NAME,
    CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS current_mb,
    HIGH_NUMBER_OF_BYTES_USED / 1024 / 1024 AS peak_mb
FROM performance_schema.memory_summary_global_by_event_name
WHERE CURRENT_NUMBER_OF_BYTES_USED > 1048576
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 20;

This shows the top memory consumers by event name. Look for unexpected growth in memory/sql/Filesort_buffer, memory/innodb/trx_sys, or memory attributed to UDFs.

Additional Information

MySQL does not enforce a memory ceiling itself. It allocates as much as the OS allows. Setting innodb_buffer_pool_size to 90% of RAM on a system with other processes guarantees eventual OOM kills.
In Docker / Kubernetes: the container's memory limit acts like a cgroup OOM limit. When mysqld exceeds the container's memory limit, the OOM killer fires inside the container. Use resources.limits.memory that accounts for the full MySQL memory footprint, including per-thread buffers.
vm.overcommit_memory=2 on Linux prevents memory overcommit: processes cannot allocate more than physical RAM + swap. This avoids OOM situations entirely but requires careful sizing — MySQL may fail to start if the buffer pool request exceeds the overcommit limit.
InnoDB crash recovery is safe — InnoDB is ACID-compliant and will always recover to a consistent state after a kill. However, recovery time scales with the amount of in-flight work at the time of the kill.

Frequently Asked Questions

Q: innodb_buffer_pool_size is 70% of RAM but I'm still getting OOM kills. Why? A: The remaining 30% must cover OS overhead (~512 MB), per-thread buffers (max_connections × ~4–16 MB), performance_schema tables, temp tables, and any other processes. On a 16 GB host with 200 connections and large per-thread buffers, the non-buffer-pool footprint can easily exceed 3 GB.

Q: MySQL was OOM-killed but dmesg shows no OOM message. What happened? A: The kill may have been triggered by a container runtime OOM limit rather than the kernel OOM killer. Check docker stats, kubectl describe pod, or the container's event log for OOM kill events.

Q: Is it safe to set oom_score_adj = -1000 (never kill mysqld)? A: Not recommended. If mysqld is truly consuming all RAM, the OOM killer will kill other processes instead (possibly the application, the SSH daemon, or the OS itself). A negative adjustment protects mysqld relative to other processes, but the extreme value -1000 shifts all the risk to system stability.

Q: After an OOM kill, should I be worried about data corruption? A: InnoDB is designed to survive unclean shutdowns. On restart, it replays the redo log and rolls back uncommitted transactions, restoring a consistent state. The only permanent data loss is for transactions that were committed on the primary but whose binary log events had not yet been sent to replicas.

Q: How do I monitor for impending OOM conditions before they cause a kill? A: Track node_memory_MemAvailable_bytes (Prometheus node_exporter) and alert when it drops below 10% of total RAM. Also monitor mysqld RSS via /proc/<pid>/status and alert when the ratio approaches your computed memory ceiling.