To create an index in Elasticsearch, send a PUT /<index_name> request with optional settings, mappings, and aliases in the body. The same API works on Elasticsearch 7.x, 8.x, and 9.x, and on OpenSearch with minor differences. The decisions you make at creation time, especially shard count and field mappings, lock in performance characteristics for the life of the index, so it pays to be deliberate up front rather than fixing things with a costly reindex later.
This guide walks through the create index API end to end: the bare minimum request, the settings that actually matter in production, mapping strategies, templates for indices you create repeatedly, and the errors you are most likely to hit.
The Simplest Create Index Request
PUT /my-index
That single line creates an index with the defaults: one primary shard, one replica, and dynamic mapping turned on. It works, and it is fine for a quick test or a tiny lookup table. For anything that will hold real traffic, you want to be explicit about settings and mappings.
A Production-Ready Create Index Request
PUT /products
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "5s",
"index.codec": "best_compression"
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": { "type": "keyword", "ignore_above": 256 }
}
},
"description": { "type": "text" },
"price": { "type": "float" },
"category": { "type": "keyword" },
"created_at": { "type": "date" },
"in_stock": { "type": "boolean" },
"tags": { "type": "keyword" }
}
}
}
The response includes acknowledged: true once the cluster state is updated and shards_acknowledged: true once the requested number of shard copies have started. If you see acknowledged: true but shards_acknowledged: false, the index exists but not every replica has come up yet. That is usually a transient state, but worth checking on a cluster under pressure.
Settings That Matter in Production
Shards and Replicas
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1
}
}
number_of_shards is the setting you cannot change later without reindexing into a new index (or using the shrink or split APIs, both of which have constraints). Choose it based on how large the index will grow:
- Aim for 10 to 50 GB of primary shard data per shard. Smaller shards waste heap on overhead, larger ones recover slowly and rebalance poorly.
- Every shard consumes a few MB of heap and a chunk of file descriptors. Do not over-shard a small index.
- A workable rule of thumb:
ceil(expected_primary_size_GB / 30).
number_of_replicas can be changed at any time with the update index settings API. Use at least 1 in production so a single node failure does not cost you data. If you find that the cluster.max_shards_per_node limit is rejecting new indices, that is a sign you are over-sharding somewhere, not a reason to raise the limit.
Refresh Interval
{
"settings": {
"refresh_interval": "30s"
}
}
Newly indexed documents become searchable when a refresh happens. The default is 1s, which gives near-real-time search at the cost of producing many small segments. For write-heavy workloads (logs, metrics, events), set it to 30s or higher; for bulk loads you can set it to -1 to disable refreshes entirely and re-enable it once the load is done. See the dedicated refresh interval guide for the trade-offs.
Compression Codec
{
"settings": {
"index.codec": "best_compression"
}
}
The default codec uses LZ4. best_compression uses ZSTD (DEFLATE in older versions), which produces 15 to 30% smaller stored fields at a small cost to retrieval latency. Worth turning on for log-style indices where storage cost matters more than read latency, and harmless for most search workloads.
Other Settings Worth Knowing
index.lifecycle.name: attach an ILM policy at creation so rollover, shrink, and delete phases run automatically.index.routing.allocation.require.*: pin the index to a specific node tier (hot, warm, cold).index.mapping.total_fields.limit: raise this if you are intentionally creating a wide index. Hitting the default of 1000 is usually a sign of accidental field explosion.
Mapping Strategies
Explicit Mappings
For anything you control, define mappings explicitly. It is the single biggest lever for query performance and storage cost.
{
"mappings": {
"properties": {
"user_id": { "type": "keyword" },
"email": { "type": "keyword" },
"full_name": { "type": "text" },
"age": { "type": "integer" },
"signup_date": { "type": "date" },
"location": { "type": "geo_point" }
}
}
}
Explicit mappings prevent the three most common production headaches:
- Numeric strings inferred as
textwhen they should bekeyword(and then sorting and aggregations silently break). - Dynamic field explosion from unconstrained nested objects.
- The painful realization that you need to change a field type, which requires a full reindex.
Multi-Field Mappings
Use multi-fields when you want both full-text search and exact matching on the same field:
{
"name": {
"type": "text",
"fields": {
"keyword": { "type": "keyword", "ignore_above": 256 }
}
}
}
Query name for full-text search, name.keyword for aggregations, sorting, term queries, and exact matches. Setting ignore_above: 256 prevents pathologically long values from blowing up your inverted index.
Dynamic Mapping Modes
{
"mappings": {
"dynamic": "strict",
"properties": { /* ... */ }
}
}
The dynamic parameter takes four values:
true(default): new fields are auto-mapped. Convenient, occasionally dangerous.runtime: new fields are added as runtime fields, which are not indexed and are evaluated from_sourceat query time. Good middle ground for schema-on-read use cases.false: new fields are stored in_sourcebut not indexed or searchable.strict: documents containing unknown fields are rejected outright. Safest for production data with a known schema.
Dynamic Templates
For semi-structured data where you cannot enumerate every field but know the type patterns, dynamic templates give you control without locking everything down:
{
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": { "type": "keyword", "ignore_above": 256 }
}
}
],
"properties": { /* explicit fields here */ }
}
}
Create the Index from a Template
If you are creating indices on a recurring basis (logs-2026-05-13, events-2026-05-13), do not put settings and mappings in every request. Define an index template once and let Elasticsearch apply it to anything matching the pattern:
PUT /_index_template/logs-template
{
"index_patterns": ["logs-*"],
"priority": 100,
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"refresh_interval": "30s"
},
"mappings": {
"properties": {
"@timestamp": { "type": "date" },
"level": { "type": "keyword" },
"service": { "type": "keyword" },
"message": { "type": "text" },
"trace_id": { "type": "keyword" }
}
}
}
}
Now any PUT /logs-anything will inherit those settings and mappings. Composable index templates have replaced the older legacy templates (deprecated in Elasticsearch 7.8). For shared building blocks across multiple templates, use component templates and reference them in composed_of.
Creating an Index from Your Application
The same API is exposed by every official client. A few common examples:
Python (elasticsearch-py)
from elasticsearch import Elasticsearch
es = Elasticsearch("https://localhost:9200")
es.indices.create(
index="products",
settings={"number_of_shards": 3, "number_of_replicas": 1},
mappings={
"properties": {
"name": {"type": "text"},
"price": {"type": "float"},
}
},
)
JavaScript (@elastic/elasticsearch)
await client.indices.create({
index: 'products',
settings: { number_of_shards: 3, number_of_replicas: 1 },
mappings: {
properties: {
name: { type: 'text' },
price: { type: 'float' },
},
},
})
Kibana Dev Tools: paste the raw PUT /products { ... } request directly. This is the fastest way to iterate on a mapping before you wire it into application code.
Common Errors When Creating an Index
resource_already_exists_exception: the index name (or an alias with that name) already exists. Either delete the old one with DELETE /index after taking a snapshot, or pick a new name.invalid_index_name_exception: index names cannot start with_,-, or+, cannot contain uppercase letters or\ / * ? " < > | , #, and must be 255 bytes or fewer (UTF-8).- `IndexCreationException`: a catch-all for failures during creation. The cause is usually visible in the response or the master node logs.
- `validation_exception` about `cluster.max_shards_per_node`: you are out of shard budget. Drop replicas, delete old indices, or rethink your sharding scheme rather than just raising the limit.
mapper_parsing_exception: the mapping body is malformed, often a typo in a field type. The response usually points at the offending property.
For a broader list of issues you might hit, see Common Elasticsearch Errors.
Best Practices Checklist
- Pick
number_of_shardsdeliberately. You cannot change it later without reindexing. - Always use at least 1 replica in production.
- Define explicit mappings for any field you query or aggregate on.
- Add
keywordsubfields to text fields you might sort or aggregate on. - Use
ignore_aboveon keyword fields to cap pathological values. - Use
dynamic: strictfor tightly defined schemas;dynamic: runtimefor schema-on-read. - Put repeatable settings in an index template, not in every request.
- Attach an ILM policy for any index that grows unboundedly.
- Point applications at an alias, not the underlying index name. It lets you reindex without an application deploy.
- Take a snapshot before reindexing or deleting anything you cannot easily rebuild.
How Pulse Helps with Index Management
Creating an index is easy. Keeping a fleet of indices well sized, well mapped, and free of silent regressions over months and years is the hard part. Pulse continuously analyzes Elasticsearch and OpenSearch clusters, surfacing over-sharded indices, mapping explosions, indices missing replicas, indices without ILM policies, and refresh-interval settings that are silently hurting indexing throughput. Instead of waiting for the next on-call incident, teams running Pulse get prioritized recommendations the moment a new index drifts away from best practice. If you are operating Elasticsearch at any non-trivial scale, connect your cluster to Pulse and let it watch the long tail of index health for you.
Frequently Asked Questions
Q: Can I change the number of shards after creating an index?
Not directly. You have three options: reindex into a new index with the desired shard count (the reindex API is the standard tool), use the shrink API to reduce shards (target must be a factor of the source count), or use the split API to increase shards (target must be a multiple of the source count, and the source must have been created with a higher number_of_routing_shards).
Q: Can I add fields to an existing mapping?
Yes. New fields can be added at any time with the put mapping API. What you cannot do is change the type of an existing field, change a multi-field configuration after data has been indexed, or change analyzers on indexed text fields. Those require a reindex.
Q: What is the difference between text and keyword?
text fields are analyzed (tokenized, lowercased, stemmed depending on analyzer) and are designed for full-text search. keyword fields are stored as a single token and are used for exact matching, sorting, aggregations, and term queries. Most string fields in real-world schemas want both: a text field with a .keyword subfield.
Q: How do I create an index for vector or semantic search?
Define a dense_vector field with the appropriate dims and similarity:
{
"mappings": {
"properties": {
"embedding": {
"type": "dense_vector",
"dims": 768,
"index": true,
"similarity": "cosine"
}
}
}
}
In Elasticsearch 8.x and later, index: true enables HNSW-based approximate kNN search out of the box.
Q: Should I disable _source?
Almost never. Disabling _source saves disk space but prevents reindexing, update, and most highlighting. The space savings rarely justify the operational pain. If storage is the concern, switch to the best_compression codec first.
Q: How do I check whether an index already exists before creating it?
Use a HEAD /index_name request, which returns 200 if the index exists and 404 if it does not. See the dedicated guide on checking if an index exists for client-library equivalents.