NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Logstash Grok Filter Plugin

The Logstash grok filter parses unstructured text into named fields using a library of pre-compiled regex patterns. Patterns follow the form %{PATTERN:field_name:type}, for example %{IP:client} or %{NUMBER:bytes:int}. Logstash ships hundreds of built-in patterns under vendor/bundle/jruby/*/gems/logstash-patterns-core-*/patterns/, organized by use case (apache, syslog, java, postgresql). Grok is the right tool when a log format is too irregular for dissect and not JSON.

Syntax

filter {
  grok {
    match            => { "message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes:int} %{NUMBER:duration:float}" }
    patterns_dir     => [ "/etc/logstash/patterns" ]
    pattern_definitions => { "CUSTOM_ID" => "[A-Z]{3}-\\d{6}" }
    break_on_match   => true
    keep_empty_captures => false
    tag_on_failure   => [ "_grokparsefailure" ]
    timeout_millis   => 30000
    ecs_compatibility => "v8"
  }
}

Named captures follow the %{PATTERN:name} form, with optional type coercion via :int or :float. Plain Oniguruma named groups (?<name>regex) are also supported for one-off patterns.

Parameters

Name Type Required Default Description
match hash yes none Map of field name to pattern (or array of patterns).
patterns_dir array no [] Extra directories of custom pattern files.
patterns_files_glob string no * Glob for pattern files inside patterns_dir.
pattern_definitions hash no {} Inline patterns defined directly in the filter config.
break_on_match boolean no true If true, stop at first matching pattern. Set false to try every pattern in an array.
keep_empty_captures boolean no false Keep optional captures that produced no value.
named_captures_only boolean no true Only capture explicitly named groups.
tag_on_failure array no ["_grokparsefailure"] Tags added when no pattern matches.
tag_on_timeout string no _groktimeout Tag added when a single match exceeds timeout_millis.
timeout_millis number no 30000 Per-event match timeout. Protects against catastrophic backtracking.
ecs_compatibility string no depends on version disabled, v1, or v8. ECS-aware built-in patterns place fields under ECS namespaces.

Examples

Parse an Apache combined access log into ECS-compliant fields:

filter {
  grok {
    match => { "message" => "%{HTTPD_COMBINEDLOG}" }
    ecs_compatibility => "v8"
  }
}

In ecs_compatibility => v8 mode, fields are emitted as [source][address], [http][request][method], [url][original], etc.

Try multiple patterns to handle a mixed-format input:

filter {
  grok {
    match => {
      "message" => [
        "%{COMMONAPACHELOG}",
        "%{SYSLOGLINE}",
        "%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}"
      ]
    }
    break_on_match => true
  }
}

Define a custom pattern inline and reference it:

filter {
  grok {
    pattern_definitions => {
      "ORDER_ID" => "ORD-\\d{8}"
    }
    match => { "message" => "order=%{ORDER_ID:order_id} user=%{USERNAME:user}" }
  }
}

Common Issues

Unmatched events are tagged _grokparsefailure and pass through with the original message untouched. A pipeline with rising _grokparsefailure rates usually means the upstream log format has drifted; route tagged events to a debug index to inspect them.

Anchor patterns with ^ and $ (or \A/\z) when you expect the whole line to match. Unanchored patterns can match a substring and produce surprising results, especially when break_on_match => false is in effect.

Patterns with unbounded alternation - (a|b|c|d|e)* followed by another quantifier - cause catastrophic backtracking on near-miss inputs. The timeout_millis parameter exists because a single bad pattern can otherwise stall the entire pipeline worker. If you see _groktimeout tags, simplify the alternation or replace it with dissect plus targeted grok.

ECS mode is a breaking change: built-in patterns like HTTPD_COMBINEDLOG produce different field names in disabled vs v8 mode. Pick one mode per pipeline and stick with it; mixing produces dashboards with half-populated fields.

Performance Notes

Grok patterns are compiled once at pipeline startup, not per event. The cost per event is regex matching against the compiled NFA. Three things dominate runtime:

  1. Number of patterns in the match array - patterns are tried in order until one matches (or all fail when break_on_match => false).
  2. Pattern complexity - unbounded */+ quantifiers with overlapping alternatives are 10-100x slower than anchored fixed-position patterns.
  3. Failure case - a non-matching pattern usually costs more than a matching one because the engine tries every backtrack path.

For consistent, fixed-position formats, dissect is typically 2-5x faster than grok and avoids regex entirely. Use the pattern: dissect to split fixed positions, then grok only the variable-shape field that needs regex.

Monitoring Logstash Grok Pipelines with Pulse

Pulse is the only tool built specifically for monitoring and optimizing Logstash pipelines. Grok is the single largest source of Logstash CPU consumption in production, and "the pipeline got slow" is usually a grok regression - either a new pattern added with catastrophic backtracking, or upstream log format drift that pushes every event through a failure path. Pulse tracks per-filter CPU cost, _grokparsefailure and _groktimeout rates per pipeline, and correlates spikes with recent pipeline config changes so you find the bad pattern in minutes, not days.

Frequently Asked Questions

Q: Where are Logstash's built-in grok patterns stored?
A: Built-in patterns ship inside the logstash-patterns-core gem at vendor/bundle/jruby/*/gems/logstash-patterns-core-*/patterns/ (organized into ECS and legacy subdirectories from Logstash 7.12 onwards). Custom patterns live in any directory you list in patterns_dir.

Q: How does the Logstash grok filter handle ECS compatibility?
A: The ecs_compatibility parameter accepts disabled, v1, or v8. In v8 mode, built-in patterns produce ECS-compliant field names ([source][address] instead of clientip, [http][request][method] instead of verb). Pick one mode per pipeline; mixing causes inconsistent field naming downstream.

Q: What is the difference between grok and dissect in Logstash?
A: Grok uses regex and handles variable-width or irregular fields; dissect uses fixed delimiters and is 2-5x faster but cannot handle optional fields or regex matching. Use dissect for consistent application logs and grok for irregular text like nginx, syslog, or Java stack traces.

Q: How do I create custom grok patterns?
A: Either inline via pattern_definitions => { "NAME" => "regex" }, or in a separate file under a directory passed via patterns_dir. Pattern files use the format PATTERN_NAME regex one per line, and patterns can reference each other recursively.

Q: What does the _grokparsefailure tag mean?
A: None of the patterns in the match array matched the source field. The original field is untouched. The most common cause is upstream log format drift; route tagged events to a debug index and add a fallback pattern.

Q: Why does Logstash hang on certain grok patterns?
A: Catastrophic backtracking. Patterns with nested unbounded quantifiers and overlapping alternation can take exponential time on near-miss inputs. The timeout_millis parameter (default 30s) prevents this from stalling the worker indefinitely. Rewriting the pattern with anchoring, atomic groups, or dissect upfront is the long-term fix.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.