Delete by Query is required when you need to remove multiple documents from an Elasticsearch index based on specific criteria, without touching the index itself. If your goal is to remove the entire index, the delete index API is dramatically faster and reclaims disk immediately. Delete by Query is useful for:
- Cleaning up old or irrelevant data
- Removing documents that match certain conditions
- Bulk deletion of documents without deleting the entire index
Steps to Perform Delete by Query
Prepare the Query:
- Determine the criteria for documents you want to delete
- Craft a query that matches these documents
Use the Delete By Query API:
- Send a POST request to the
_delete_by_queryendpoint - Example:
POST /your_index/_delete_by_query { "query": { "match": { "status": "obsolete" } } }
- Send a POST request to the
Monitor the Operation:
- Check the response for the number of deleted documents
- Use the
wait_for_completion=falseparameter for large deletions to run asynchronously
Verify the Deletion:
- Run a search query to ensure the targeted documents are no longer present
Best Practices and Additional Information
- Always test your query on a small subset or test index before running it on production data
- Use the
conflicts=proceedparameter to continue deletion even if version conflicts occur - For large-scale deletions, consider using the Sliced Scroll API to parallelize the operation
- Be aware that Delete by Query can be resource-intensive; schedule it during off-peak hours if possible
- Regularly optimize your index after large deletions to reclaim disk space
Large delete-by-query operations can be resource-intensive and impact cluster stability. Pulse provides real-time monitoring and optimization recommendations for your Elasticsearch clusters, helping you track the impact of bulk operations and maintain healthy cluster performance.
Frequently Asked Questions
Q: Can I undo a Delete by Query operation?
A: No, Delete by Query operations are not reversible. Always ensure you have a backup or snapshot before performing large-scale deletions.
Q: How does Delete by Query affect index performance?
A: Delete by Query can be resource-intensive and may temporarily impact search and indexing performance, especially for large operations.
Q: Is there a limit to how many documents can be deleted in one operation?
A: While there's no hard limit, it's recommended to batch very large deletions to manage resource usage and avoid timeouts.
Q: Can I use Delete by Query across multiple indices?
A: Yes, you can specify multiple indices or use index patterns in the API call to delete across multiple indices.
Q: How can I track the progress of a large Delete by Query operation?
A: Use the Task API to monitor the progress of asynchronous Delete by Query operations initiated with wait_for_completion=false.