Delete by Query is required when you need to remove multiple documents from an Elasticsearch index based on specific criteria. This operation is useful for:
- Cleaning up old or irrelevant data
- Removing documents that match certain conditions
- Bulk deletion of documents without deleting the entire index
Steps to Perform Delete by Query
Prepare the Query:
- Determine the criteria for documents you want to delete
- Craft a query that matches these documents
Use the Delete By Query API:
- Send a POST request to the
_delete_by_query
endpoint - Example:
POST /your_index/_delete_by_query { "query": { "match": { "status": "obsolete" } } }
- Send a POST request to the
Monitor the Operation:
- Check the response for the number of deleted documents
- Use the
wait_for_completion=false
parameter for large deletions to run asynchronously
Verify the Deletion:
- Run a search query to ensure the targeted documents are no longer present
Best Practices and Additional Information
- Always test your query on a small subset or test index before running it on production data
- Use the
conflicts=proceed
parameter to continue deletion even if version conflicts occur - For large-scale deletions, consider using the Sliced Scroll API to parallelize the operation
- Be aware that Delete by Query can be resource-intensive; schedule it during off-peak hours if possible
- Regularly optimize your index after large deletions to reclaim disk space
Frequently Asked Questions
Q: Can I undo a Delete by Query operation?
A: No, Delete by Query operations are not reversible. Always ensure you have a backup or snapshot before performing large-scale deletions.
Q: How does Delete by Query affect index performance?
A: Delete by Query can be resource-intensive and may temporarily impact search and indexing performance, especially for large operations.
Q: Is there a limit to how many documents can be deleted in one operation?
A: While there's no hard limit, it's recommended to batch very large deletions to manage resource usage and avoid timeouts.
Q: Can I use Delete by Query across multiple indices?
A: Yes, you can specify multiple indices or use index patterns in the API call to delete across multiple indices.
Q: How can I track the progress of a large Delete by Query operation?
A: Use the Task API to monitor the progress of asynchronous Delete by Query operations initiated with wait_for_completion=false
.