Elasticsearch delete_by_query: How to Delete Documents by Query

Delete by Query is required when you need to remove multiple documents from an Elasticsearch index based on specific criteria. This operation is useful for:

Cleaning up old or irrelevant data
Removing documents that match certain conditions
Bulk deletion of documents without deleting the entire index

Steps to Perform Delete by Query

Prepare the Query:
- Determine the criteria for documents you want to delete
- Craft a query that matches these documents

Use the Delete By Query API:

Send a POST request to the _delete_by_query endpoint

Example:

POST /your_index/_delete_by_query
{
  "query": {
    "match": {
      "status": "obsolete"
    }
  }
}

Monitor the Operation:
- Check the response for the number of deleted documents
- Use the wait_for_completion=false parameter for large deletions to run asynchronously
Verify the Deletion:
- Run a search query to ensure the targeted documents are no longer present

Best Practices and Additional Information

Always test your query on a small subset or test index before running it on production data
Use the conflicts=proceed parameter to continue deletion even if version conflicts occur
For large-scale deletions, consider using the Sliced Scroll API to parallelize the operation
Be aware that Delete by Query can be resource-intensive; schedule it during off-peak hours if possible
Regularly optimize your index after large deletions to reclaim disk space

Frequently Asked Questions

Q: Can I undo a Delete by Query operation?
A: No, Delete by Query operations are not reversible. Always ensure you have a backup or snapshot before performing large-scale deletions.

Q: How does Delete by Query affect index performance?
A: Delete by Query can be resource-intensive and may temporarily impact search and indexing performance, especially for large operations.

Q: Is there a limit to how many documents can be deleted in one operation?
A: While there's no hard limit, it's recommended to batch very large deletions to manage resource usage and avoid timeouts.

Q: Can I use Delete by Query across multiple indices?
A: Yes, you can specify multiple indices or use index patterns in the API call to delete across multiple indices.

Q: How can I track the progress of a large Delete by Query operation?
A: Use the Task API to monitor the progress of asynchronous Delete by Query operations initiated with wait_for_completion=false.