The Elasticsearch _update_by_query API modifies every document in an index that matches a query, optionally applying a Painless script. It scrolls the matching set and issues bulk updates internally, supports parallelism via slices=auto, and can run asynchronously with wait_for_completion=false. Use it for bulk field changes, backfilling new fields, or re-running ingest-time logic on existing documents - all without a full reindex.
When to Use update_by_query (vs Alternatives)
| Goal | Better choice |
|---|---|
| Re-index documents with a new mapping or analyzer | Reindex API - mapping changes need a fresh index |
| Modify a small known set of documents by ID | _bulk with update actions - cheaper than a query |
| Mass field update or backfill across matching documents | _update_by_query (this API) |
| Re-run ingest pipeline on indexed data | _update_by_query?pipeline=<name> with match_all |
| Change the type of an existing field | Reindex into a new index - update_by_query cannot change types |
Prerequisites
- Elasticsearch 6.x or later (API has been stable for several major versions).
- The user or API key needs
readandwriteindex privileges on the target index. - Painless scripting enabled (default on most distributions).
- Headroom for the scroll plus bulk updates running concurrently; throttle with
requests_per_secondif the cluster is hot.
Step-by-Step: Update Documents by Query
Verify the query first with a count. Always run the same query against
_countor_search?size=0to confirm the document set before making changes.GET /my-index/_count { "query": { "term": { "status.keyword": "draft" } } }Run a simple update with a Painless script.
POST /my-index/_update_by_query { "query": { "term": { "status.keyword": "draft" } }, "script": { "source": "ctx._source.status = 'published'", "lang": "painless" } }The response includes
updated,version_conflicts,batches,failures, andtook.Use parameters for safer scripts. Inlining values invalidates the script cache and risks injection-style bugs.
POST /my-index/_update_by_query { "query": { "term": { "category": "books" } }, "script": { "source": "ctx._source.price = ctx._source.price * params.factor", "lang": "painless", "params": { "factor": 1.1 } } }Add
conflicts=proceedfor indices under concurrent writes.POST /my-index/_update_by_query?conflicts=proceed { "query": { "match_all": {} }, "script": { "source": "ctx._source.updated_at = params.now", "params": { "now": "2026-05-17T00:00:00Z" } } }Parallelize with
slices=auto. Best practice for large jobs - runs one slice per primary shard.POST /my-index/_update_by_query?slices=auto&conflicts=proceed { "query": { ... }, "script": { ... } }Run async with the Task API for long jobs.
POST /my-index/_update_by_query?wait_for_completion=false&slices=auto { "query": { ... }, "script": { ... } }Response:
{ "task": "<task-id>" }. Monitor withGET /_tasks/<task-id>, cancel withPOST /_tasks/<task-id>/_cancel.Re-run an ingest pipeline on existing data.
POST /my-index/_update_by_query?pipeline=my-pipeline { "query": { "match_all": {} } }Throttle the rate with
requests_per_second. Set to-1to disable.POST /my-index/_update_by_query?requests_per_second=1000&slices=auto { "query": { ... }, "script": { ... } }
update_by_query in Production: What to Watch For
_update_by_query is heavier than _delete_by_query because it has to fetch each matching document, apply the script, and write a new version - and the new version may have a different size, triggering segment merges. On indices with a write-heavy workload running underneath, the version conflict rate can climb fast. Always combine conflicts=proceed with a follow-up query that re-checks the documents that were skipped.
Heap pressure during large updates is a separate concern. The scrolled scan plus the bulk write workers compete for the search and write thread pools. If you see rejected tasks in thread_pool/write while a job is in flight, lower slices or requests_per_second.
Run update_by_query Safely with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. Before and during _update_by_query, Pulse:
- Verifies cluster capacity for the operation: heap for the scroll, write thread pool headroom for the bulk updates, disk for the new document versions
- Surfaces concurrent operations that could collide - active reindex, ILM rollover, another long-running
_update_by_queryon the same index - Tracks the operation's progress and impact on production traffic in real time:
version_conflictsrate, write rejections inthread_pool/write, merge IO, search latency p95 - Recommends throttling with
requests_per_secondor loweringslicesif production search latency starts climbing
Start a free trial before your next bulk update.
Common Mistakes
- Modifying a field's type with update_by_query. Scripts cannot change a field's mapped type. You need a reindex into a new index with the updated mapping.
- Inlining values into the script source. Use
params. It is safer and lets Elasticsearch cache the compiled script. - Omitting
conflicts=proceedon actively written indices. The job aborts on the first concurrent update. - Setting
sliceshigher than the primary shard count. Excess slices just add coordination overhead.slices=autois correct. - Forgetting to refresh. Updated documents are not visible to search until the next refresh. Set
?refresh=trueon small updates if you need immediate visibility. - No snapshot before destructive scripts. A script that overwrites a field cannot be undone short of a snapshot restore.
Frequently Asked Questions
Q: Can update_by_query change a field's type or mapping?
A: No. update_by_query operates on document contents only. To change a field's type, you have to create a new index with the desired mapping and reindex. For purely additive mapping changes, the put mapping API is enough.
Q: How do I track the progress of an update_by_query in Elasticsearch?
A: Submit the request with wait_for_completion=false, take the returned task ID, and poll GET /_tasks/<task-id>. The response shows updated count, batches, version conflicts, and elapsed time. Cancel a runaway job with POST /_tasks/<task-id>/_cancel.
Q: What does conflicts=proceed do in update_by_query?
A: Without it, update_by_query aborts on the first version conflict (a document updated between the scroll snapshot and the update). With conflicts=proceed, conflicted documents are skipped and the operation continues. The response still records them under version_conflicts.
Q: How do I use update_by_query to delete documents?
A: Set ctx.op = 'delete' inside the script for documents that should be removed. This is occasionally useful when the delete criteria depend on per-document logic, but for straight deletes, _delete_by_query is simpler and faster.
Q: How do I limit how many documents update_by_query processes?
A: Use the max_docs parameter (or size on older versions) in the request body. For example "max_docs": 1000 processes only the first 1000 matching documents. This is useful for staged rollouts of risky updates.
Q: Can update_by_query run across multiple indices or data streams?
A: Yes. Pass a comma list (POST /index-a,index-b/_update_by_query) or a pattern (POST /logs-2025-*/_update_by_query). For data streams, the API rewrites the matching backing indices in place.
Q: How do I re-run an ingest pipeline on existing documents?
A: POST /my-index/_update_by_query?pipeline=<pipeline-name> with a match_all query. Every matching document is read, passed back through the pipeline, and re-indexed - useful after fixing a buggy enrich processor.
Q: What's the best tool to run update_by_query safely on a production cluster?
A: Pulse is purpose-built for this. It is an AI DBA for Elasticsearch and OpenSearch that pre-checks cluster capacity, surfaces conflicting operations, tracks version_conflicts, write thread pool rejections, and merge IO in real time, and recommends throttling via requests_per_second or lowering slices when update_by_query starts impacting production latency.
Related Reading
- Delete by Query in Elasticsearch: the sibling API for bulk deletes.
- Reindex Data Guide: when an update is not enough and you need a fresh index.
- Changing a Field Type: why type changes require reindex, not update_by_query.
- Invalid update_by_query Operation: troubleshooting the most common errors.
- Elasticsearch Upsert Operations: the per-document variant of "update or create".
- Index Refresh Interval: why updates are not immediately visible to search.