Elasticsearch Synonym Filter: Configuration, Updates, and Scoring

The synonym filter in Elasticsearch transforms query terms or indexed tokens into related terms, enabling search to match on words that mean the same thing. A search for "automobile" can match documents containing "car", "vehicle", or "auto".

Elasticsearch provides two synonym filters: synonym for simple token replacements and synonym_graph for multi-word expressions. This guide covers both, along with reloadable synonyms, scoring behavior, and production patterns.

The Two Synonym Filters

synonym

The synonym filter replaces or expands tokens in a flat token stream. It works correctly for single-word synonyms but produces incorrect token positions for multi-word synonyms (which can break phrase queries).

{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonyms": {
          "type": "synonym",
          "synonyms": [
            "laptop, notebook",
            "phone, mobile, cellphone"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "my_synonyms"]
        }
      }
    }
  }
}

synonym_graph (Recommended)

The synonym_graph filter produces a proper token graph with correct positions for multi-word synonyms. This means phrase queries and proximity queries work correctly even when synonyms map between different word counts.

{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms": [
        "ny, new york",
        "ai, artificial intelligence",
        "usa, united states of america"
      ]
    }
  }
}

Always use synonym_graph for query-time synonyms. The synonym filter exists for backward compatibility and index-time use cases where the graph structure isn't needed.

Configuration Methods

Inline Synonyms

For small, static lists, define synonyms directly in the index settings:

{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms": [
        "quick, fast, speedy",
        "big, large, huge",
        "happy, glad, joyful"
      ]
    }
  }
}

File-Based Synonyms

For larger or frequently updated lists, reference an external file:

{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms_path": "analysis/synonyms.txt",
      "updateable": true
    }
  }
}

The file path is relative to the Elasticsearch config directory. The file must be present on every node in the cluster.

File format (one rule per line, # for comments):

# Equivalent synonyms
laptop, notebook, portable computer
phone, mobile, cellphone, smartphone

# Explicit mappings (directional)
ny => new york
sf => san francisco
ml => machine learning

Synonyms API (Elasticsearch 8.10+)

Elasticsearch 8.10 introduced the Synonyms API, which manages synonym sets centrally without file distribution:

# Create a synonym set
PUT /_synonyms/my-synonyms
{
  "synonyms_set": [
    { "id": "1", "synonyms": "laptop, notebook, portable computer" },
    { "id": "2", "synonyms": "ny => new york" },
    { "id": "3", "synonyms": "phone, mobile, cellphone" }
  ]
}

# Reference in analyzer
{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms_set": "my-synonyms",
      "updateable": true
    }
  }
}

Updates to the synonym set via the API are automatically applied to all search analyzers referencing it — no file distribution or reload needed.

Reloadable Synonyms

For query-time synonym filters with "updateable": true, you can reload synonyms without restarting Elasticsearch:

# After updating the synonyms file on all nodes:
POST /my-index/_reload_search_analyzers

Requirements:

The synonym filter must have "updateable": true
The filter must be used in a search_analyzer (query-time), not the index analyzer
The synonyms file must be updated on all nodes before reloading

This is the recommended approach for production synonym management when not using the Synonyms API.

Index-Time vs. Query-Time Synonyms

Query-Time (Recommended)

Apply synonyms only during search:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "my_synonym_analyzer"
      }
    }
  }
}

Advantages:

Update synonyms without re-indexing
Smaller index size
Easier to manage and test

Index-Time

Apply synonyms during indexing:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
    }
  }
}

Advantages:

No query-time expansion overhead
More predictable scoring (IDF is computed on expanded terms)

Disadvantages:

Changing synonyms requires full re-index
Larger index size

Scoring Behavior with Synonyms

The IDF Problem

Query-time synonym expansion can produce unexpected relevance scoring. When "laptop" expands to "laptop OR notebook OR portable computer", each term has a different IDF (Inverse Document Frequency). Rare synonym terms score disproportionately high.

Example: If "portable computer" appears in only 5 documents but "laptop" appears in 5,000, documents matching "portable computer" get much higher scores even though the user searched for "laptop".

Mitigations

Use auto_generate_synonyms_phrase_query: true (default in match queries): This generates phrase queries for multi-word synonyms, improving scoring for phrase-level matches.

Boost the original term: In your application layer, boost the user's original query term relative to expanded synonyms:

POST /products/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": { "query": "laptop", "boost": 2 } } },
        { "match": { "title": { "query": "notebook portable computer", "boost": 1 } } }
      ]
    }
  }
}

Use synonym format with explicit mappings: Map the less-common terms to the more common one:
```
notebook => laptop
portable computer => laptop
```
This replaces rather than expands, avoiding the IDF discrepancy.

Testing Synonyms

Analyze API

Always test synonym configurations before deploying:

POST /products/_analyze
{
  "analyzer": "my_synonym_analyzer",
  "text": "I need a new laptop for work"
}

This returns the token stream with synonym expansions, showing exactly what terms will be searched.

Validate with Search

# Create a test document
POST /products/_doc/1
{
  "title": "Best notebook computers for professionals"
}

# Search with synonym expansion
POST /products/_search
{
  "query": { "match": { "title": "laptop" } }
}

If synonyms are configured correctly, this should match the document containing "notebook".

Filter Chain Ordering

The order of filters in your analyzer matters:

{
  "analyzer": {
    "my_analyzer": {
      "tokenizer": "standard",
      "filter": [
        "lowercase",
        "my_synonyms",
        "stemmer_english"
      ]
    }
  }
}

lowercase first: Synonym matching is case-sensitive internally. Define synonyms in lowercase and apply the lowercase filter before synonyms.
synonyms second: Sees lowercased, unstemmed tokens.
stemmer last: Stems both original and synonym tokens consistently.

Never place synonyms after stemming — the stemmer changes token forms and your synonym rules won't match.

Frequently Asked Questions

Q: Can I use the same synonyms.txt file across multiple indices?

Yes. Reference the same file path in each index's analyzer settings. When updating, remember to reload each index individually with _reload_search_analyzers.

Q: Why do my phrase queries break with synonyms?

If you're using the synonym filter (not synonym_graph) with multi-word synonyms, token positions are incorrect and phrase queries fail. Switch to synonym_graph for query-time analysis.

Q: How do synonyms interact with fuzzy matching?

Fuzzy matching (fuzziness: "AUTO" in match queries) applies to each term independently before synonym expansion. A fuzzy match on "lapto" finds "laptop", which then expands to include synonyms. The order is: fuzziness → tokenization → synonym expansion.

Q: Can I have directional synonyms (A → B but not B → A)?

Yes. Use explicit mapping syntax:

laptop => notebook, portable computer

This means searching for "laptop" also matches "notebook" and "portable computer", but searching for "notebook" only matches "notebook".

Q: What's the maximum synonym list size?

There's no hard limit, but very large synonym lists (10,000+ rules) increase analysis latency. For massive vocabularies, consider whether embedding-based semantic search would be more effective and maintainable than exhaustive synonym lists.

Q: Do synonyms work with the _all field or copy_to?

Synonyms apply to whatever analyzer is configured on the field. Fields using copy_to inherit the target field's analyzer. There's no _all field in recent Elasticsearch versions — use copy_to with a field configured with your synonym analyzer.