Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Read more

Elasticsearch Synonym Filter: Configuration, Updates, and Scoring

The synonym filter in Elasticsearch transforms query terms or indexed tokens into related terms, enabling search to match on words that mean the same thing. A search for "automobile" can match documents containing "car", "vehicle", or "auto".

Elasticsearch provides two synonym filters: synonym for simple token replacements and synonym_graph for multi-word expressions. This guide covers both, along with reloadable synonyms, scoring behavior, and production patterns.

The Two Synonym Filters

synonym

The synonym filter replaces or expands tokens in a flat token stream. It works correctly for single-word synonyms but produces incorrect token positions for multi-word synonyms (which can break phrase queries).

{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonyms": {
          "type": "synonym",
          "synonyms": [
            "laptop, notebook",
            "phone, mobile, cellphone"
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "my_synonyms"]
        }
      }
    }
  }
}

synonym_graph (Recommended)

The synonym_graph filter produces a proper token graph with correct positions for multi-word synonyms. This means phrase queries and proximity queries work correctly even when synonyms map between different word counts.

{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms": [
        "ny, new york",
        "ai, artificial intelligence",
        "usa, united states of america"
      ]
    }
  }
}

Always use synonym_graph for query-time synonyms. The synonym filter exists for backward compatibility and index-time use cases where the graph structure isn't needed.

Configuration Methods

Inline Synonyms

For small, static lists, define synonyms directly in the index settings:

{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms": [
        "quick, fast, speedy",
        "big, large, huge",
        "happy, glad, joyful"
      ]
    }
  }
}

File-Based Synonyms

For larger or frequently updated lists, reference an external file:

{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms_path": "analysis/synonyms.txt",
      "updateable": true
    }
  }
}

The file path is relative to the Elasticsearch config directory. The file must be present on every node in the cluster.

File format (one rule per line, # for comments):

# Equivalent synonyms
laptop, notebook, portable computer
phone, mobile, cellphone, smartphone

# Explicit mappings (directional)
ny => new york
sf => san francisco
ml => machine learning

Synonyms API (Elasticsearch 8.10+)

Elasticsearch 8.10 introduced the Synonyms API, which manages synonym sets centrally without file distribution:

# Create a synonym set
PUT /_synonyms/my-synonyms
{
  "synonyms_set": [
    { "id": "1", "synonyms": "laptop, notebook, portable computer" },
    { "id": "2", "synonyms": "ny => new york" },
    { "id": "3", "synonyms": "phone, mobile, cellphone" }
  ]
}

# Reference in analyzer
{
  "filter": {
    "my_synonyms": {
      "type": "synonym_graph",
      "synonyms_set": "my-synonyms",
      "updateable": true
    }
  }
}

Updates to the synonym set via the API are automatically applied to all search analyzers referencing it — no file distribution or reload needed.

Reloadable Synonyms

For query-time synonym filters with "updateable": true, you can reload synonyms without restarting Elasticsearch:

# After updating the synonyms file on all nodes:
POST /my-index/_reload_search_analyzers

Requirements:

  • The synonym filter must have "updateable": true
  • The filter must be used in a search_analyzer (query-time), not the index analyzer
  • The synonyms file must be updated on all nodes before reloading

This is the recommended approach for production synonym management when not using the Synonyms API.

Index-Time vs. Query-Time Synonyms

Query-Time (Recommended)

Apply synonyms only during search:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard",
        "search_analyzer": "my_synonym_analyzer"
      }
    }
  }
}

Advantages:

  • Update synonyms without re-indexing
  • Smaller index size
  • Easier to manage and test

Index-Time

Apply synonyms during indexing:

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
    }
  }
}

Advantages:

  • No query-time expansion overhead
  • More predictable scoring (IDF is computed on expanded terms)

Disadvantages:

  • Changing synonyms requires full re-index
  • Larger index size

Scoring Behavior with Synonyms

The IDF Problem

Query-time synonym expansion can produce unexpected relevance scoring. When "laptop" expands to "laptop OR notebook OR portable computer", each term has a different IDF (Inverse Document Frequency). Rare synonym terms score disproportionately high.

Example: If "portable computer" appears in only 5 documents but "laptop" appears in 5,000, documents matching "portable computer" get much higher scores even though the user searched for "laptop".

Mitigations

  1. Use auto_generate_synonyms_phrase_query: true (default in match queries): This generates phrase queries for multi-word synonyms, improving scoring for phrase-level matches.

  2. Boost the original term: In your application layer, boost the user's original query term relative to expanded synonyms:

    POST /products/_search
    {
      "query": {
        "bool": {
          "should": [
            { "match": { "title": { "query": "laptop", "boost": 2 } } },
            { "match": { "title": { "query": "notebook portable computer", "boost": 1 } } }
          ]
        }
      }
    }
    
  3. Use synonym format with explicit mappings: Map the less-common terms to the more common one:

    notebook => laptop
    portable computer => laptop
    

    This replaces rather than expands, avoiding the IDF discrepancy.

Testing Synonyms

Analyze API

Always test synonym configurations before deploying:

POST /products/_analyze
{
  "analyzer": "my_synonym_analyzer",
  "text": "I need a new laptop for work"
}

This returns the token stream with synonym expansions, showing exactly what terms will be searched.

Validate with Search

# Create a test document
POST /products/_doc/1
{
  "title": "Best notebook computers for professionals"
}

# Search with synonym expansion
POST /products/_search
{
  "query": { "match": { "title": "laptop" } }
}

If synonyms are configured correctly, this should match the document containing "notebook".

Filter Chain Ordering

The order of filters in your analyzer matters:

{
  "analyzer": {
    "my_analyzer": {
      "tokenizer": "standard",
      "filter": [
        "lowercase",
        "my_synonyms",
        "stemmer_english"
      ]
    }
  }
}
  1. lowercase first: Synonym matching is case-sensitive internally. Define synonyms in lowercase and apply the lowercase filter before synonyms.
  2. synonyms second: Sees lowercased, unstemmed tokens.
  3. stemmer last: Stems both original and synonym tokens consistently.

Never place synonyms after stemming — the stemmer changes token forms and your synonym rules won't match.

Frequently Asked Questions

Q: Can I use the same synonyms.txt file across multiple indices?

Yes. Reference the same file path in each index's analyzer settings. When updating, remember to reload each index individually with _reload_search_analyzers.

Q: Why do my phrase queries break with synonyms?

If you're using the synonym filter (not synonym_graph) with multi-word synonyms, token positions are incorrect and phrase queries fail. Switch to synonym_graph for query-time analysis.

Q: How do synonyms interact with fuzzy matching?

Fuzzy matching (fuzziness: "AUTO" in match queries) applies to each term independently before synonym expansion. A fuzzy match on "lapto" finds "laptop", which then expands to include synonyms. The order is: fuzziness → tokenization → synonym expansion.

Q: Can I have directional synonyms (A → B but not B → A)?

Yes. Use explicit mapping syntax:

laptop => notebook, portable computer

This means searching for "laptop" also matches "notebook" and "portable computer", but searching for "notebook" only matches "notebook".

Q: What's the maximum synonym list size?

There's no hard limit, but very large synonym lists (10,000+ rules) increase analysis latency. For massive vocabularies, consider whether embedding-based semantic search would be more effective and maintainable than exhaustive synonym lists.

Q: Do synonyms work with the _all field or copy_to?

Synonyms apply to whatever analyzer is configured on the field. Fields using copy_to inherit the target field's analyzer. There's no _all field in recent Elasticsearch versions — use copy_to with a field configured with your synonym analyzer.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.