How to Perform Update by Query in Elasticsearch

Update by Query is necessary when you need to modify multiple documents in Elasticsearch that match a specific query, without having to reindex the entire dataset. This operation is particularly useful for:

  • Updating field values across many documents
  • Adding new fields to existing documents
  • Applying a script to modify document contents
  • Updating documents after mapping changes

Steps to Perform Update by Query

  1. Prepare the Update Query: Construct a query that matches the documents you want to update.

  2. Define the Update Script: Create a script that specifies the changes to be applied to the matching documents.

  3. Execute the Update by Query: Use the Update by Query API to apply the changes.

    POST /your_index/_update_by_query
    {
      "query": {
        "match": {
          "field_name": "value_to_match"
        }
      },
      "script": {
        "source": "ctx._source.field_to_update = 'new_value'"
      }
    }
    
  4. Monitor the Progress: For large updates, use the Task API to track the progress of the operation.

  5. Verify the Results: After the update is complete, query the index to confirm the changes have been applied correctly.

Best Practices and Additional Information

  • Use the conflicts parameter to control how version conflicts are handled.
  • Set wait_for_completion=false for large updates to run the operation asynchronously.
  • Consider using slicing to parallelize large update operations.
  • Be cautious with update scripts that significantly change document sizes, as this can impact index performance.
  • Always test update operations on a small subset or a test index before applying to production data.

Frequently Asked Questions

Q: Can I use Update by Query to modify the mapping of an index?
A: No, Update by Query cannot modify the mapping. It can only update document contents within the existing mapping structure. For mapping changes, you need to reindex the data.

Q: How can I limit the number of documents updated in a single operation?
A: You can use the size parameter in your Update by Query request to limit the number of documents processed. For example, "size": 1000 will process only 1000 documents.

Q: Is it possible to perform Update by Query across multiple indices?
A: Yes, you can specify multiple indices in the API call, or use index patterns like my-index-* to update documents across multiple indices.

Q: What happens if an error occurs during the Update by Query operation?
A: By default, the operation will abort on the first error. You can use the conflicts=proceed parameter to continue processing despite version conflicts or other errors.

Q: Can I use Update by Query to delete documents?
A: While Update by Query is primarily for updating documents, you can use it to effectively delete documents by setting a condition in your script that calls ctx.op = 'delete' for documents that should be removed.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.