The Elasticsearch significant_terms aggregation is a multi-bucket aggregation that returns terms whose frequency in a foreground set (matched by the query) is statistically higher than expected from the background set (typically the whole index). The default scoring algorithm is JLH (Jensen-Lin Hellinger), which balances absolute increase in frequency and relative ratio. Use it for "what is different about this subset" questions - root-cause analysis, surfacing co-occurring entities, or anomaly detection on textual fields.
Syntax
GET /ecommerce/_search
{
"query": { "range": { "total_amount": { "gte": 1000 } } },
"size": 0,
"aggs": {
"significant_products": {
"significant_terms": {
"field": "product_name.keyword",
"size": 10
}
}
}
}
Results include both doc_count (foreground) and bg_count (background) plus a score. The query defines the foreground; the index (or a background_filter) defines the background.
Parameters
| Parameter | Default | Description |
|---|---|---|
field |
required | Field to bucket on. keyword or numeric. |
size |
10 | Number of top-scoring buckets returned. |
shard_size |
size * 1.5 + 10 |
Per-shard candidate set forwarded to the coordinator. |
min_doc_count |
3 | Foreground doc count threshold. Lower = noisier. |
shard_min_doc_count |
0 | Per-shard equivalent. |
background_filter |
none (uses index) | Restrict the background corpus. |
include / exclude |
- | Regex or value list to filter candidate terms. |
mutual_information / chi_square / gnd / script_heuristic / percentage |
- | Alternative scoring heuristics. Default is JLH. |
min_doc_count: 3 (default) is set to avoid surfacing single-document false positives.
Examples
Significant products in high-value orders, with a background filter restricting the comparison corpus:
"aggs": {
"products": {
"significant_terms": {
"field": "product.keyword",
"background_filter": { "term": { "store_country": "US" } }
}
}
}
Use mutual information instead of JLH (rewards relative ratio more):
"significant_terms": {
"field": "category.keyword",
"mutual_information": { "background_is_superset": true }
}
Significant tags inside each top product (nested usage):
"aggs": {
"by_product": {
"terms": { "field": "product.keyword", "size": 10 },
"aggs": {
"tags": {
"significant_terms": { "field": "tag.keyword", "size": 5 }
}
}
}
}
Scoring and Performance Notes
The default JLH score rewards terms that are both relatively much more frequent in the foreground and have a meaningful absolute count - it deliberately avoids returning extremely rare terms that happen to appear once. Alternative heuristics suit different priorities:
| Heuristic | Bias |
|---|---|
| JLH (default) | Balances absolute and relative frequency increase. |
mutual_information |
Rewards strong association even at low absolute counts. |
chi_square |
Statistical significance, ignores effect size. |
gnd (Google Normalized Distance) |
Co-occurrence semantics. |
percentage |
Simple foreground/background ratio. |
background_filter is the lever to control what "normal" looks like. Without it, the background is the whole index, which is rarely what you want - filter to comparable documents (same product line, same time window) so the foreground is contrasted against a meaningful baseline.
The aggregation scans foreground and background frequencies per shard, so its cost is similar to a terms aggregation plus a background lookup. On very high-cardinality fields the candidate set forwarded to the coordinator can grow large; raise min_doc_count or shard_min_doc_count to keep memory bounded. Reading slow logs to find which significant_terms queries are dominating cluster cost - and which would be cheaper as a scoped background_filter - is exactly the loop Pulse runs continuously.
Common Mistakes
- Running significant_terms without a
background_filterwhen the index spans heterogeneous tenants or product lines - results are dominated by background noise. - Setting
min_doc_count: 1to "see more" and getting spurious one-off matches dominate the result. - Using significant_terms on near-unique fields (IDs). Nothing is statistically significant if every value occurs once.
- Confusing JLH score with a confidence level. It is a ranking, not a p-value.
- Pointing at a
textfield without.keyword, triggering fielddata.
Find Slow significant_terms Queries with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For significant_terms aggregations specifically, Pulse:
- Identifies significant_terms queries running without a
background_filteron indices that span heterogeneous tenants or product lines, where results are dominated by background noise and the foreground scan is wasted - Flags significant_terms with
min_doc_count: 1or lowshard_min_doc_count, where the candidate set forwarded to the coordinator grows large enough to push request circuit breakers - Detects significant_terms running on near-unique fields (IDs, session UUIDs), where no term can be statistically significant and the aggregation is pure overhead
- Spots fielddata loading triggered by significant_terms on analyzed
textfields - Traces each slow significant_terms back to the calling service via slow-log and APM correlation
- Recommends concrete fixes: add a meaningful
background_filter, raisemin_doc_count, switch from JLH tomutual_informationorchi_squarewhen ranking criteria need to change, move to a.keywordsub-field, or replace the aggregation withsignificant_texton natural-language fields - Tracks coordinator memory, latency, and result quality after the change ships
This converts the manual significance-tuning loop into a continuous optimization workflow.
Frequently Asked Questions
Q: How does significant_terms differ from terms aggregation?
A: The terms aggregation returns the most frequent values. The significant_terms aggregation returns values that are more frequent in the foreground subset than expected from the background corpus, ranking by statistical surprise rather than raw count.
Q: What is the default scoring algorithm?
A: JLH (Jensen-Lin Hellinger), which combines absolute and relative frequency change. JLH was chosen as the default because it does not over-reward extremely rare terms.
Q: What does background_filter do?
A: It restricts the background corpus the foreground is compared against. Without it the background is the whole index, which often produces uninteresting results when the index is heterogeneous.
Q: Can significant_terms be used on numeric fields?
A: Yes for integer-like fields with bounded cardinality. For continuous numerics, bucket into ranges or use significant_text on natural language fields. There is also a significant_text aggregation purpose-built for analyzed text.
Q: How are multi-valued fields handled?
A: Each value in a multi-valued field is treated as a separate observation. A document with three tags contributes three increments to the foreground counts for those tags.
Q: Why does my significant_terms return common stopwords?
A: You are likely running it on a text analyzer or a field where stopwords dominate both foreground and background. Use a normalized keyword field, or pre-filter with exclude.
Q: How do I find significant_terms queries that are dominating cluster cost?
A: Pulse profiles Elasticsearch and OpenSearch slow logs, isolates significant_terms queries without a background_filter or with low min_doc_count flooding the coordinator, attributes each to the calling service, and recommends background_filter scoping, threshold raises, or .keyword sub-field migration.
Related Reading
- Terms Aggregation: raw frequency ranking.
- Rare Terms Aggregation: infrequent values - inverse pattern.
- Multi Terms Aggregation: combine multiple fields, complementary technique.
- Cardinality Aggregation: scope the candidate space before running significant_terms.
- Elasticsearch Query Language: the DSL significant_terms runs inside.