Elasticsearch delete_by_query: How to Delete Documents by Query

Delete by Query is required when you need to remove multiple documents from an Elasticsearch index based on specific criteria. This operation is useful for:

  • Cleaning up old or irrelevant data
  • Removing documents that match certain conditions
  • Bulk deletion of documents without deleting the entire index

Steps to Perform Delete by Query

  1. Prepare the Query:

    • Determine the criteria for documents you want to delete
    • Craft a query that matches these documents
  2. Use the Delete By Query API:

    • Send a POST request to the _delete_by_query endpoint
    • Example:
      POST /your_index/_delete_by_query
      {
        "query": {
          "match": {
            "status": "obsolete"
          }
        }
      }
      
  3. Monitor the Operation:

    • Check the response for the number of deleted documents
    • Use the wait_for_completion=false parameter for large deletions to run asynchronously
  4. Verify the Deletion:

    • Run a search query to ensure the targeted documents are no longer present

Best Practices and Additional Information

  • Always test your query on a small subset or test index before running it on production data
  • Use the conflicts=proceed parameter to continue deletion even if version conflicts occur
  • For large-scale deletions, consider using the Sliced Scroll API to parallelize the operation
  • Be aware that Delete by Query can be resource-intensive; schedule it during off-peak hours if possible
  • Regularly optimize your index after large deletions to reclaim disk space

Frequently Asked Questions

Q: Can I undo a Delete by Query operation?
A: No, Delete by Query operations are not reversible. Always ensure you have a backup or snapshot before performing large-scale deletions.

Q: How does Delete by Query affect index performance?
A: Delete by Query can be resource-intensive and may temporarily impact search and indexing performance, especially for large operations.

Q: Is there a limit to how many documents can be deleted in one operation?
A: While there's no hard limit, it's recommended to batch very large deletions to manage resource usage and avoid timeouts.

Q: Can I use Delete by Query across multiple indices?
A: Yes, you can specify multiple indices or use index patterns in the API call to delete across multiple indices.

Q: How can I track the progress of a large Delete by Query operation?
A: Use the Task API to monitor the progress of asynchronous Delete by Query operations initiated with wait_for_completion=false.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.