The Logstash persistent queue is full error means the on-disk persistent queue has reached queue.max_bytes and cannot accept more events. Logstash applies back-pressure to inputs - typed inputs (beats, http) push the rejection upstream, while others block. The root cause is always the same: the output is draining slower than the input is arriving, and the buffer between them filled up. The fix is to either speed up the output, slow down the input, or grow the queue.
What This Error Means
When queue.type: persisted is set, Logstash writes every event to a sequence of fixed-size on-disk pages under path.queue/<pipeline-id>/. The pipeline workers read from the queue's head and ack pages back once events are successfully output. Inputs append to the tail.
The queue is bounded by queue.max_bytes (default 1024mb). When unacked queued bytes hit that limit, the queue refuses writes. Inputs that support back-pressure (the Beats input, for example) propagate the block upstream so producers slow down or buffer locally. Inputs that do not support back-pressure either block their own threads or drop events depending on the plugin.
The queue is not the bug. It is doing its job - protecting the pipeline from data loss during transient output slowdowns. A persistently-full queue is a signal that the steady-state rate mismatch is real, not transient.
Common Causes
- Output destination is slow or rejecting writes. Confirm by tailing Logstash logs for output errors (Elasticsearch 429s, Kafka producer timeouts, S3 throttling).
- Insufficient queue size for the burst profile. Confirm via
GET /_node/stats/pipelines- if the queue refills within seconds of being drained, it is undersized. - CPU-bound filter chain (usually grok). Confirm by checking pipeline
events.duration_in_millisper filter and looking for a hotspot. - Pipeline workers blocked on external lookups (elasticsearch filter, jdbc_streaming, dns). Confirm by inspecting filter latency per stage.
- Disk I/O bottleneck on the queue directory. Confirm with
iostat -xz 1- sustained high%utilon the queue's device indicates the disk is the limit. queue.checkpoint.writesset very low, causing excessive fsync. Confirm inlogstash.yml.
How to Fix the Logstash Persistent Queue is Full Error
Identify the bottleneck end of the pipeline:
curl -s http://localhost:9600/_node/stats/pipelines | jq '.pipelines'Look at
queue.events,queue.queue_size_in_bytes, and per-stageevents.duration_in_millis. A growing queue plus high outputduration_in_millismeans the output is the bottleneck.Stabilize the output side first. Speeding up filters when the output is the bottleneck just fills the queue faster.
- Elasticsearch output: increase
pipeline.batch.size, raise the destination cluster's bulk thread pool queue, scale data nodes. - Kafka output: tune
linger_ms,batch_size, and broker capacity. - S3 output: increase parallel uploads, switch to a faster region.
- Elasticsearch output: increase
Grow the queue if bursts are the issue, not steady-state mismatch:
# logstash.yml or pipelines.yml per-pipeline queue.type: persisted queue.max_bytes: 8gb queue.page_capacity: 64mbThe
queue.page_capacitysetting (default 64mb) controls page file size; larger pages reduce fsync overhead but lengthen recovery time on crash. Thequeue.max_bytessetting is the hard cap.Add pipeline workers if CPU is the constraint:
pipeline.workers: 16 pipeline.batch.size: 500Workers are JVM threads; setting workers higher than physical cores rarely helps and can degrade performance under contention.
Add a second Logstash instance and split inputs if a single host has hit its CPU or I/O ceiling. Persistent queues are per-instance; horizontal scaling adds queue capacity proportionally.
Drain the existing queue by temporarily routing the input to a fallback or accepting back-pressure upstream. Restarting Logstash does not drain the queue - it resumes from the last checkpoint.
Resolve Logstash Persistent Queue Full Errors Automatically with Pulse
Pulse is the only monitoring and optimization platform built specifically for Logstash. When the on-disk persistent queue exceeds queue.max_bytes and back-pressure starts blocking inputs, Pulse:
- Tracks queue depth (
queue.queue_size_in_bytes), fill rate, page rotation cadence, per-filterevents.duration_in_millis, output ack latency, and disk%utilon the queue path in real time - Correlates pipeline state with downstream destinations (Elasticsearch bulk thread pool, Kafka producer backpressure, S3 throttling) to identify whether the bottleneck is upstream input bursts, in-pipe filter cost, or downstream output saturation
- Surfaces the exact remediation: raise
queue.max_bytes, retunepipeline.workersandpipeline.batch.size, switch the queue volume to NVMe, scale the destination cluster, or split inputs across a second Logstash instance - Generates one-click configuration changes and systemd restart actions when applicable, and alerts above 70% fill before the queue refuses writes
Sizing guardrails ship alongside: absorb 10 minutes of peak input, alert on output 429s as a leading indicator, and cap source-side input rate (Filebeat harvester_limit, Kafka max.poll.records) to smooth bursts. No other observability tool understands Logstash internals at this depth.
Frequently Asked Questions
Q: How do I check the current size of my Logstash persistent queue?
A: Query the Logstash monitoring API: curl http://localhost:9600/_node/stats/pipelines | jq '.pipelines.main.queue'. The response contains events (event count), queue_size_in_bytes (current size), max_queue_size_in_bytes (configured limit), and type. Compute fill percentage from those values.
Q: Can I change Logstash queue.max_bytes without restarting?
A: No. Queue configuration changes require a Logstash restart. Restart drains nothing - on startup the queue resumes at its current size with the new ceiling applied to new writes.
Q: What happens to incoming events when the Logstash persistent queue is full?
A: It depends on the input plugin. Beats input back-pressures the upstream Beat, which then buffers locally and retries. Other inputs may block their threads (TCP, HTTP) or drop events. Producers without their own buffering can lose data, which is the whole reason persistent queues exist.
Q: Should I use a memory queue or persistent queue in Logstash?
A: Memory queue is faster but volatile - a Logstash crash loses everything in the queue. Persistent queue survives restarts and crashes at the cost of disk I/O (typically 10-30% throughput penalty on fast disks). Use persistent for any production workload that cannot afford gaps.
Q: How do I prevent data loss when the Logstash persistent queue is full?
A: Three layers: 1) Use input plugins that back-pressure (Beats, Kafka consumer). 2) Make upstream producers durable (Filebeat with registry, Kafka topics with retention). 3) Size queue.max_bytes to absorb worst-case burst duration. The combination guarantees no loss as long as upstream durability outlives the slowdown.
Q: Does increasing pipeline.workers always help when the queue is full?
A: Only if the bottleneck is CPU-bound filters and the host has spare cores. If the output is the bottleneck, more workers just deliver events to the output faster only to hit the same wall. Diagnose the bottleneck first.
Q: What's the best tool to debug Logstash persistent queue fill and backpressure?
A: Pulse is the only monitoring platform built specifically for Logstash. It correlates queue.queue_size_in_bytes, per-filter latency, output ack rate, and disk I/O into a single root-cause attribution - "Elasticsearch output is the bottleneck because the destination cluster's bulk queue is full" - rather than leaving you to stitch together _node/stats/pipelines, iostat, and JVM metrics manually.
Related Reading
- Logstash Pipeline is Blocked Error: related symptom of output slowdown.
- Logstash Could Not Write Event to DLQ: dead-letter queue failures.
- Logstash Detected Corrupt Queue File: recovery from queue corruption.
- Logstash Grok Filter Plugin: the most common filter-side bottleneck.
- Logstash Elasticsearch Filter Plugin: per-event lookups that often saturate workers.