Elasticsearch Adjacency Matrix Aggregation

What it does

The Adjacency Matrix Aggregation is a bucket aggregation in Elasticsearch that creates a matrix of relationships between configured filters. This aggregation allows you to define multiple filters and generates buckets for each pair of filters, containing documents that match both filters. It's an efficient way to analyze relationships and connections within your data, and particularly useful for graph analysis and exploring connections between different sets of documents.

Syntax and Documentation

{
  "adjacency_matrix": {
    "filters": {
      "grpA": { "term": { "field1": "valueA" } },
      "grpB": { "term": { "field2": "valueB" } },
      "grpC": { "term": { "field3": "valueC" } }
    }
  }
}

For more details, refer to the official Elasticsearch documentation on Adjacency Matrix Aggregation.

Example Usage

Here's an example of using the Adjacency Matrix Aggregation to analyze relationships between different categories of products:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "interactions": {
      "adjacency_matrix": {
        "filters": {
          "electronics": { "term": { "category": "electronics" } },
          "books": { "term": { "category": "books" } },
          "clothing": { "term": { "category": "clothing" } }
        }
      }
    }
  }
}

This query will return buckets for each pair of categories, showing how many products belong to both categories simultaneously.

Common Issues

High memory usage: For large datasets with many filters, this aggregation can consume significant memory.
Performance impact: Complex filters or a large number of filters can slow down query execution.
Misinterpretation of results: It's important to understand that the buckets represent the intersection of filters, not just individual filter matches.

Best Practices

Limit the number of filters to keep the matrix manageable and performance-friendly.
Use simple and efficient filters to improve overall aggregation performance.
Consider using sparse_matrix parameter to optimize for cases where many filter combinations might be empty.
Combine with other aggregations for more detailed analysis of the relationships.

Frequently Asked Questions

Q: How does the Adjacency Matrix Aggregation differ from a regular multi-bucket aggregation?
A: Unlike regular multi-bucket aggregations, the Adjacency Matrix Aggregation creates buckets for every combination of defined filters, allowing for analysis of intersections between different document sets.

Q: Can I use script-based filters in an Adjacency Matrix Aggregation?
A: Yes, you can use script-based filters, but be cautious as they may impact performance more than simple term filters.

Q: Is there a limit to the number of filters I can use in an Adjacency Matrix Aggregation?
A: While there's no hard limit, it's recommended to keep the number of filters reasonable (typically under 10) to maintain good performance and manageability.

Q: How can I optimize the Adjacency Matrix Aggregation for large datasets?
A: Use the sparse_matrix parameter, limit the number of filters, ensure your filters are efficient, and consider using sampling if appropriate for your use case.

Q: Can the Adjacency Matrix Aggregation be used for time-based analysis?
A: Yes, you can incorporate date range filters to analyze relationships over time, but be mindful of the potential increase in complexity and resource usage.