What is the Buffer Engine?
The Buffer engine is a special ClickHouse table engine that holds inserted rows in RAM and periodically flushes them in batches to a destination table, typically a MergeTree-family table. It exists so that applications generating many small inserts do not create thousands of tiny parts on disk, which would trigger merges, increase I/O, and risk a Too Many Parts error. Data in the buffer is volatile - a server crash before flush loses everything still in memory - so the engine is a throughput optimization, not a durability layer.
How Buffer Works
A Buffer table is declared with the destination database and table, a num_layers parallelism factor, and three pairs of min/max flush thresholds for time, rows, and bytes. The engine creates num_layers independent buffers and writes each incoming block into one of them at random. A buffer flushes to the destination table when all three min_* thresholds are satisfied, or when any single max_* threshold is exceeded, whichever happens first. Flushes also fire on OPTIMIZE TABLE ... FINAL, on DROP TABLE, and on graceful shutdown.
-- Buffer in front of events_data, with 16 parallel layers
CREATE TABLE events_buffer AS events_data
ENGINE = Buffer(
currentDatabase(), -- database
'events_data', -- destination table
16, -- num_layers
10, 100, -- min_time, max_time (seconds)
10000, 1000000, -- min_rows, max_rows
10000000, 100000000 -- min_bytes, max_bytes (10 MB, 100 MB)
);
Reading from a Buffer table returns the union of buffered rows and rows already flushed to the destination - so queries see "live" data even before persistence. The destination table must have the same column types in the same order; type mismatches will be detected at first flush, not at create time.
Buffer Engine Parameters
| Parameter | Purpose | Typical value |
|---|---|---|
database |
Destination database (use currentDatabase() for same DB) |
currentDatabase() |
table |
Destination table name | the target MergeTree table |
num_layers |
Number of parallel buffers (use 1 to preserve insert order) | 8-16 |
min_time / max_time |
Seconds since first write before flushing | 10 / 100 |
min_rows / max_rows |
Rows per buffer before flushing | 10000 / 1000000 |
min_bytes / max_bytes |
Bytes per buffer before flushing | 10 MB / 100 MB |
flush_time (optional) |
Hard upper bound on time before flush | - |
flush_rows (optional) |
Hard upper bound on rows | - |
flush_bytes (optional) |
Hard upper bound on bytes | - |
A buffer's total memory footprint is approximately num_layers * max_bytes. Size the parameters to stay well under max_server_memory_usage so the buffer cannot starve queries.
Common Pitfalls with Buffer
- Treating Buffer as durable storage - it is not. Always have an upstream retry mechanism (Kafka, application queue) for the data you cannot lose on a server restart.
- Using Buffer with
INSERTrates that would already be efficient (large batches of 10k+ rows). The buffer adds memory pressure and a flush thread for no gain; insert directly to the MergeTree instead. - Setting
num_layerstoo high. Withnum_layers > 16and a lowmax_bytes, you fragment data into many tiny flushes and partially defeat the purpose. - Forgetting that
ALTER TABLEon the destination is not visible to the Buffer until you also recreate the Buffer with the new structure. - Putting a Buffer in front of a
Distributedtable when async inserts (see below) would be a better fit.
Buffer vs Async Inserts
Modern ClickHouse (22.3+) provides an alternative: server-side asynchronous inserts (async_insert=1, wait_for_async_insert=0). The server buffers incoming inserts in a write-ahead log on each node and flushes them to the destination table, with optional durability guarantees via async_insert_use_adaptive_busy_timeout. Async inserts are preferred for new workloads because they persist to disk before acknowledging the client - unlike Buffer, which keeps data in RAM. Use Buffer when you need the merge-rate smoothing of in-memory buffering, or when you cannot turn on server-side async inserts cluster-wide.
Operating Buffer Tables
Monitor buffered row counts via system.tables (the total_rows column reflects rows currently buffered) and watch for BufferFlush events in system.events. If flushes start blocking on slow destination merges, you will see memory grow toward num_layers * max_bytes and eventually trigger backpressure on inserts.
Pulse tracks Buffer tables alongside the destination MergeTree, correlates buffer growth with merge pressure, and flags the root cause when a buffer stops flushing - whether the destination is rejecting inserts, the disk is full, or a replica is unreachable. That visibility is particularly valuable because Buffer failures are silent: rows can be lost at restart without any error in the client.
Frequently Asked Questions
Q: Is data in the ClickHouse Buffer engine durable?
A: No. Buffer holds rows in RAM until a flush threshold triggers a write to the destination table. If the server crashes, the OOM killer fires, or kill -9 is used, all rows still in the buffer are lost. Use an upstream durable queue (Kafka, application retry log) if you cannot tolerate that loss.
Q: When should I use Buffer instead of async inserts?
A: Use server-side async inserts (async_insert=1) for most new workloads because they persist before acknowledging the client. Use Buffer when you specifically want the in-memory rate smoothing - for example, when an application generates thousands of single-row inserts per second and you can tolerate a small loss window in exchange for very low destination-side merge pressure.
Q: Can I query a Buffer table and get rows that have not been flushed yet?
A: Yes. SELECT from a Buffer table returns the union of in-memory rows and rows already in the destination table, so queries always see the latest data. However, PREWHERE, projections, and skip indices on the destination apply only to flushed rows.
Q: How do I force a Buffer to flush immediately?
A: Run OPTIMIZE TABLE buffer_table (without FINAL) - it triggers an immediate flush of all layers. DROP TABLE buffer_table also flushes synchronously before dropping. The buffer also flushes on graceful server shutdown.
Q: What happens to Buffer data on graceful shutdown vs crash?
A: On graceful shutdown (SIGTERM), ClickHouse flushes all buffers to their destination tables before exiting. On a crash, OOM kill, or hard kill -9, the in-memory rows are lost. There is no on-disk WAL for Buffer.
Q: Does Buffer support ALTER TABLE on the destination?
A: Schema changes on the destination are not automatically reflected in the Buffer. After altering the destination's columns or types, drop and recreate the Buffer table so its structure matches, otherwise the next flush will fail with a structure-mismatch error.
Related Reading
- ClickHouse MergeTree Engine - typical destination engine for a Buffer table
- ClickHouse Memory Management - sizing buffer memory against server limits
- Too Many Parts Error - the failure mode Buffer was designed to prevent
- ClickHouse INSERT INTO - async inserts, batch sizing, and deduplication
- ClickHouse Documentation Hub - index of all ClickHouse KB pages
- Memory Limit Exceeded - what oversized Buffer parameters can trigger