Logstash Error: Persistent queue is full - Common Causes & Fixes

Pulse - Elasticsearch Operations Done Right

On this page

Brief Explanation Impact Common Causes Troubleshooting and Resolution Steps Best Practices Frequently Asked Questions

Brief Explanation

The "Persistent queue is full" error in Logstash occurs when the persistent queue reaches its maximum capacity and can no longer accept new events. This typically happens when the rate of incoming events exceeds the rate at which Logstash can process and output them.

Impact

When the persistent queue is full, Logstash will stop accepting new events, potentially leading to data loss if the input source does not have its own buffering mechanism. This can disrupt the entire data pipeline, affecting downstream systems that rely on timely data processing.

Common Causes

  1. High input rate exceeding processing capacity
  2. Slow output destinations or network issues
  3. Insufficient queue size configuration
  4. Resource constraints (CPU, memory, disk I/O)
  5. Complex filter operations slowing down processing

Troubleshooting and Resolution Steps

  1. Check queue settings: Review the persistent queue configuration in your Logstash pipeline. Ensure the queue size is appropriate for your use case.

    queue.type: persisted
    queue.max_bytes: 1gb
    
  2. Monitor queue metrics: Use Logstash monitoring APIs or tools to track queue size and event throughput.

  3. Optimize pipeline performance:

    • Simplify complex filter operations
    • Increase worker threads if CPU resources allow
    • Batch events for more efficient processing
  4. Scale Logstash: Consider horizontal scaling by adding more Logstash instances to distribute the load.

  5. Tune output performance: Optimize output plugin configurations and ensure destination systems can handle the load.

  6. Implement back pressure: Use input plugins that support back pressure to slow down event ingestion when necessary.

  7. Increase resources: Allocate more CPU, memory, or disk I/O to Logstash if resource constraints are the bottleneck.

Best Practices

  • Regularly monitor Logstash performance and queue metrics
  • Implement proper error handling and retry mechanisms in your data pipeline
  • Use circuit breakers to prevent queue overflow in extreme situations
  • Consider using multiple smaller pipelines instead of one large pipeline for better resource management

Frequently Asked Questions

Q: How do I check the current size of my persistent queue?
A: You can use the Logstash monitoring API to check queue metrics. Send a GET request to http://localhost:9600/_node/stats/pipelines and look for the queue section in the response.

Q: Can I change the queue size dynamically without restarting Logstash?
A: No, queue size configuration changes require a Logstash restart to take effect.

Q: What happens to events when the queue is full?
A: When the queue is full, new events will be rejected. The behavior depends on the input plugin; some may retry, while others may drop events.

Q: Is it better to use memory queue or persistent queue?
A: Persistent queues offer better durability and can survive Logstash restarts, but they have higher I/O overhead. Choose based on your reliability requirements and performance needs.

Q: How can I prevent data loss when the persistent queue is full?
A: Implement back pressure in your data pipeline, use input plugins with built-in buffering, and ensure your data sources can handle temporary stoppages or have their own queuing mechanisms.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.