Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Read more

OpenSearch Create Index: Settings, Mappings, and Best Practices

Creating an index in OpenSearch involves more than just an API call — the settings and mappings you choose at creation time determine query performance, storage efficiency, and operational flexibility for the lifetime of the index.

This guide covers the Create Index API, essential settings, mapping strategies, and the practices that prevent costly re-indexing later.

Basic Index Creation

The simplest way to create an index:

PUT /my-index

This creates an index with default settings (1 primary shard, 1 replica) and dynamic mapping enabled. It works, but you'll almost always want to be more explicit.

Index Creation with Settings and Mappings

A production-ready index creation request:

PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "refresh_interval": "5s",
    "index.codec": "best_compression"
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "description": {
        "type": "text"
      },
      "price": {
        "type": "float"
      },
      "category": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      },
      "in_stock": {
        "type": "boolean"
      },
      "tags": {
        "type": "keyword"
      }
    }
  }
}

Essential Settings

Shard Configuration

{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

number_of_shards: The number of primary shards. This cannot be changed after creation (without re-indexing). Choose based on expected data size:

  • Target 10–50 GB per shard for optimal performance
  • Each shard consumes heap memory (~10 MB overhead), so don't over-shard small indices
  • Formula: number_of_shards = ceil(expected_total_size_GB / 30)

number_of_replicas: The number of copies of each primary shard. Can be changed at any time. Use 1 for most production workloads; 0 for ephemeral or easily-rebuilt indices.

Refresh Interval

{
  "settings": {
    "refresh_interval": "30s"
  }
}

Controls how often newly indexed documents become searchable. The default is 1s, which provides near-real-time search but adds overhead. For write-heavy workloads (logs, metrics), increase to 30s or higher. Set to -1 to disable automatic refresh during bulk indexing, then re-enable after.

Compression

{
  "settings": {
    "index.codec": "best_compression"
  }
}
  • default: LZ4 compression — fast, moderate ratio
  • best_compression: ZSTD — higher compression ratio, slightly slower reads. Best for indices where storage cost matters more than read latency.

Mapping Strategies

Explicit Mappings (Recommended)

Define field types explicitly for production indices:

{
  "mappings": {
    "properties": {
      "user_id": { "type": "keyword" },
      "email": { "type": "keyword" },
      "full_name": { "type": "text" },
      "age": { "type": "integer" },
      "signup_date": { "type": "date" },
      "location": { "type": "geo_point" },
      "profile": { "type": "object", "enabled": false }
    }
  }
}

Explicit mappings prevent:

  • Incorrect type inference (a numeric string mapped as text instead of keyword)
  • Unnecessary field indexing (setting "enabled": false on fields you never search)
  • Mapping explosions from unconstrained dynamic fields

Multi-Field Mappings

Use multi-fields when you need both full-text search and exact matching on the same field:

{
  "name": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  }
}

Query name for full-text search, name.keyword for aggregations, sorting, and exact match.

Dynamic Mapping Controls

If you can't define all fields upfront, control dynamic mapping behavior:

{
  "mappings": {
    "dynamic": "strict",
    "properties": { ... }
  }
}
  • true (default): New fields are automatically mapped. Convenient but risky.
  • strict: Reject documents with unknown fields. Safest for production.
  • false: Accept documents with unknown fields but don't index them. They're stored in _source but not searchable.

Dynamic Templates

For semi-structured data, use dynamic templates to control how unmapped fields are indexed:

{
  "mappings": {
    "dynamic_templates": [
      {
        "strings_as_keywords": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "keyword"
          }
        }
      },
      {
        "longs_as_integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      }
    ]
  }
}

Index Templates

For indices created on a recurring basis (daily log indices, time-series data), use index templates:

PUT /_index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "priority": 100,
  "template": {
    "settings": {
      "number_of_shards": 2,
      "number_of_replicas": 1,
      "refresh_interval": "30s"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "level": { "type": "keyword" },
        "service": { "type": "keyword" },
        "message": { "type": "text" },
        "trace_id": { "type": "keyword" }
      }
    }
  }
}

Any index matching the pattern logs-* will automatically inherit these settings and mappings.

Component Templates

Break reusable mapping fragments into composable components:

PUT /_component_template/base-settings
{
  "template": {
    "settings": {
      "number_of_replicas": 1,
      "refresh_interval": "10s"
    }
  }
}

PUT /_index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "composed_of": ["base-settings"],
  "template": {
    "mappings": { ... }
  }
}

Index State Management (ISM)

OpenSearch's ISM automates index lifecycle operations:

PUT /_plugins/_ism/policies/log-rotation
{
  "policy": {
    "policy_id": "log-rotation",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_size": "30gb",
              "min_index_age": "1d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": { "min_index_age": "7d" }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": { "number_of_replicas": 0 }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": { "min_index_age": "30d" }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [{ "delete": {} }]
      }
    ],
    "ism_template": [
      {
        "index_patterns": ["logs-*"],
        "priority": 100
      }
    ]
  }
}

Common Mistakes

  1. Not setting shard count explicitly: The default (1 shard) doesn't scale for large indices. You can't change shard count without re-indexing.

  2. Too many shards for small indices: Each shard has fixed overhead. An index with 100 MB of data doesn't need 10 shards — 1 is enough.

  3. Relying entirely on dynamic mapping: Dynamic mapping often guesses wrong (mapping numeric IDs as long when keyword is correct) and can cause mapping explosions with high-cardinality nested objects.

  4. Forgetting keyword sub-fields on text fields: You'll need keyword for aggregations, sorting, and exact-match filtering. Add it upfront rather than re-indexing later.

  5. Setting replicas to 0 in production: You lose data if a node fails. Always use at least 1 replica for data you can't easily rebuild.

  6. Ignoring ignore_above on keyword fields: Very long string values in keyword fields waste storage and memory. Set ignore_above: 256 (or appropriate limit) to skip indexing oversized values.

Frequently Asked Questions

Q: Can I change mappings after creating an index?

You can add new fields but cannot change existing field types. To change a field type, create a new index with the correct mapping and use the Reindex API to migrate data.

Q: How do I know the right number of shards?

Estimate total index size, then divide by 30 GB. For a 100 GB index, 3–4 primary shards is reasonable. Monitor shard sizes with _cat/shards and adjust for future indices.

Q: Should I use aliases with my indices?

Yes. Always point applications at aliases rather than direct index names. This lets you re-index, roll over, or swap indices without changing application code.

Q: What's the difference between index templates and legacy templates?

Composable index templates (_index_template) replaced legacy templates (_template) in OpenSearch 2.x. Use composable templates — they support component templates and priority ordering, and legacy templates are deprecated.

Q: How do I create an index for vector/semantic search?

Add a knn_vector field in your mapping:

{
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "nmslib"
        }
      }
    }
  },
  "settings": {
    "index.knn": true
  }
}
Pulse - Elasticsearch Operations Done Right

Pulse can solve your OpenSearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.