Elasticsearch Ingest Pipeline Failure

Ingest pipelines transform documents before they reach an index. When a processor in the chain fails - a grok pattern that doesn't match, a Painless script that hits a null field, a convert processor that can't cast a string to an integer - the entire document is rejected by default. The indexing request returns a 400-level error and the document never makes it into the index. Understanding the common failure modes and how to build pipelines that handle bad data gracefully will save you from silent data loss in production.

Common Processor Failures

Grok pattern mismatch is the most frequent offender. The grok processor applies a regular expression to a string field and extracts named captures. If the input doesn't match any of the supplied patterns, the processor throws an exception. Log format changes, unexpected multiline entries, or fields that occasionally contain JSON instead of plain text all trigger this. The error message reads something like "reason": "Provided Grok expressions do not match field value".

Painless script errors show up as ScriptException in the response. A NullPointerException means the script tried to access a field or nested object that doesn't exist on that particular document. Use null-safe operators (?.) and explicit null checks to guard against this:

{
  "script": {
    "processor": {
      "source": "if (ctx.user?.age != null) { ctx.user.age_group = ctx.user.age > 30 ? 'senior' : 'junior'; }"
    }
  }
}

Convert type errors happen when the convert processor can't parse the field value. Asking it to convert "N/A" to integer fails immediately. Fields with mixed types across documents - sometimes a number, sometimes a placeholder string - are the usual cause.

Building on_failure Handlers

Every processor accepts an on_failure array. When the processor throws an exception, Elasticsearch runs the processors listed in on_failure instead of aborting the pipeline. You can define handlers at two levels: on individual processors, or at the pipeline level as a catch-all.

PUT _ingest/pipeline/my-pipeline
{
  "processors": [
    {
      "grok": {
        "tag": "parse-syslog",
        "field": "message",
        "patterns": ["%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:host} %{GREEDYDATA:body}"],
        "on_failure": [
          {
            "set": {
              "field": "parse_error",
              "value": "grok_failed: "
            }
          }
        ]
      }
    },
    {
      "convert": {
        "tag": "cast-status",
        "field": "status",
        "type": "integer",
        "on_failure": [
          {
            "set": {
              "field": "status",
              "value": 0
            }
          }
        ]
      }
    }
  ]
}

When the grok fails, the document continues through the rest of the pipeline with a parse_error field attached. The convert processor falls back to a default value of 0 when the cast fails. Without these on_failure blocks, any single processor failure would reject the document entirely.

A pipeline-level on_failure acts as a last resort. If a processor fails and has no local handler, execution jumps to the pipeline's on_failure block. A common pattern is to route failed documents to a dead-letter index:

PUT _ingest/pipeline/my-pipeline
{
  "processors": [ ... ],
  "on_failure": [
    {
      "set": {
        "field": "_index",
        "value": "failed-"
      }
    }
  ]
}

Failure Metadata Fields

Inside any on_failure block, Elasticsearch exposes metadata about what went wrong. These fields exist only within the on_failure context:

_ingest.on_failure_message - the exception message from the failed processor
_ingest.on_failure_processor_type - the processor type that failed (e.g., grok, convert, script)
_ingest.on_failure_processor_tag - the tag value you assigned to the processor
_ingest.on_failure_pipeline - the pipeline ID, useful when pipelines call other pipelines

Storing these values on the document makes post-mortem analysis straightforward:

{
  "set": {
    "field": "ingest_failure",
    "value": "processor [] tag []: "
  }
}

Tag your processors consistently. Without tags, _ingest.on_failure_processor_tag is empty and you're left guessing which of your three grok processors actually failed.

Debugging With _simulate

The _ingest/pipeline/_simulate API runs a pipeline against sample documents without indexing anything. This is the fastest way to iterate on processor logic and error handling.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "message",
          "patterns": ["%{IP:client_ip} %{WORD:method} %{URIPATHPARAM:path}"]
        }
      }
    ]
  },
  "docs": [
    { "_source": { "message": "192.168.1.1 GET /api/users" } },
    { "_source": { "message": "this will not match" } }
  ]
}

The response shows the transformed document for successful cases and the full exception for failures. Add ?verbose to the query string to see the output after each processor step - this is invaluable for multi-stage pipelines where you need to know exactly where things went wrong.

You can also simulate against a stored pipeline by ID instead of inlining the definition:

POST _ingest/pipeline/my-pipeline/_simulate
{ "docs": [ { "_source": { "message": "test" } } ] }

Use the Kibana Grok Debugger (under Dev Tools) to test grok patterns interactively before embedding them in a pipeline definition. It's faster than round-tripping through _simulate when you're just trying to get the regex right.

Pipeline Versioning and Safe Updates

Pipelines are mutable - a PUT overwrites the previous definition with no history. The version parameter is an integer meant for external tracking systems. Elasticsearch stores it but doesn't enforce or auto-increment it:

PUT _ingest/pipeline/my-pipeline
{
  "version": 3,
  "_meta": {
    "updated_by": "platform-team",
    "changelog": "added fallback for missing host field"
  },
  "processors": [ ... ]
}

The _meta object is freeform and useful for recording who changed what. Neither version nor _meta affects pipeline execution. To implement real version control, store your pipeline definitions in Git and deploy them through CI/CD. This gives you diffs, rollback, and review - none of which Elasticsearch provides natively. When deploying updates, always run the new definition through _simulate with a representative set of documents from production before applying it. A pipeline change that breaks parsing will reject every document hitting that pipeline until you fix it or roll back.