NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

How to Clone an Elasticsearch Index: Step-by-Step Guide

The Elasticsearch Clone Index API creates a new index that is an exact copy of an existing one, by hard-linking the source's segment files instead of re-indexing every document. Use it to duplicate an index for testing, to apply different settings to a copy of production data, or as part of a shard-count migration. Clone is dramatically faster than reindexing but works only within the same cluster and requires the source to be read-only.

When to Use Clone (vs Alternatives)

Goal Better choice
Copy an index to a different cluster Snapshot and restore or reindex from remote - clone is single-cluster only
Change shard count downwards Shrink index API (target must be a factor of source)
Change shard count upwards Split index API (target must be a multiple of source, with number_of_routing_shards set at creation)
Test new mappings on prod data Clone, then update settings or remap fields
Migrate to a new field type Reindex - clone preserves segments and cannot change mappings
Roll a snapshot of production for offline use Snapshot and restore - clone is online and intra-cluster

Prerequisites

  • Source and target indices live in the same Elasticsearch (or OpenSearch) cluster.
  • Source index is open and has all primaries on the same node (clone hard-links from local files).
  • Source index is set read-only with index.blocks.write: true before the clone call.
  • The user or API key has manage privilege on both the source pattern and the target name.
  • Enough free disk space - clone creates new segment file links and metadata, and any divergence after clone consumes real disk per copy.
  • The cluster has enough available shards for the new index (cluster.max_shards_per_node budget).

Step-by-Step: Clone an Index

  1. Set the source index read-only. The Clone API refuses to run otherwise.

    PUT /source-index/_settings
    {
      "settings": { "index.blocks.write": true }
    }
    
  2. Verify the source has all primaries allocated and the cluster is green for that index.

    GET /_cluster/health/source-index?wait_for_status=green&timeout=30s
    
  3. Run the clone.

    POST /source-index/_clone/target-index
    

    The target inherits the source's shard count, mapping, and most settings.

  4. Optionally override settings or aliases on the target. The Clone API accepts the same body shape as Create Index for settings and aliases.

    POST /source-index/_clone/target-index
    {
      "settings": {
        "index.number_of_replicas": 2,
        "index.refresh_interval": "30s"
      },
      "aliases": {
        "products-current": {}
      }
    }
    

    You cannot change number_of_shards during clone - it is preserved from the source. Use shrink or split for shard count changes.

  5. Wait for the target to go green.

    GET /_cluster/health/target-index?wait_for_status=green&timeout=60s
    
  6. Remove the write block from the source if you still need to write to it.

    PUT /source-index/_settings
    {
      "settings": { "index.blocks.write": null }
    }
    

    Leaving the block in place is fine if the source is being retired.

How Clone Differs from Reindex

The single biggest reason to clone instead of reindex is speed and disk efficiency. Clone hard-links the underlying Lucene segment files into the new index directory; no document is re-parsed, no analyzer runs again, no IO is spent rewriting data. The target initially shares disk usage with the source, and divergence (real disk consumption) only happens as either index is updated.

That is also clone's limitation: because no documents are re-indexed, you cannot change anything that requires re-analysis - mapping types, analyzers, the way text is tokenized. If you need any of those, reindex is the only path.

Clone in Production: What to Watch For

The most common production failure mode is forgetting to release the write block on the source after cloning. The source index silently rejects writes from the application, and depending on the client, you may not see the error in a busy log until traffic patterns shift. Build a habit of pairing clone calls with an explicit unblock step in scripts and runbooks.

The second is allocation: clone requires all primary shards of the source to be on the same node so it can hard-link the files. On a cluster with shard rebalancing enabled and a busy source, this allocation step itself can take several minutes.

Run Clone Index Safely with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch. Before and during a _clone operation, Pulse:

  • Verifies cluster capacity for the operation: disk headroom on the target node (clones diverge over time and consume real disk), cluster.max_shards_per_node budget, primary co-location of source shards
  • Surfaces concurrent operations that could collide - active rebalance moving source primaries away, snapshot in progress on the source, another clone or shrink on the same index
  • Tracks the operation's progress and impact in real time: allocation stage, hard-link completion, write-block status on the source, divergence between source and clone disk usage
  • Recommends releasing the write block once the clone is green, and flags lingering index.blocks.write: true settings that the runbook forgot to remove

Start a free trial before your next clone.

Common Mistakes

  1. Skipping the read-only block. The API responds with cluster_block_exception or illegal_argument_exception if index.blocks.write is not set before the call.
  2. Leaving the source read-only. Application writes silently fail until the block is removed.
  3. Expecting to change number_of_shards during clone. Shard count is preserved. Use shrink (decrease) or split (increase) for that.
  4. Cloning across clusters. Clone is intra-cluster only. Use snapshot/restore or reindex-from-remote.
  5. Cloning a closed index. The source must be open. Open it first.
  6. Not budgeting disk. Even with hard links, any subsequent write to either index makes the data diverge and consumes real disk.

Frequently Asked Questions

Q: Can I clone an Elasticsearch index to a different cluster?
A: No. The Clone API is intra-cluster only because it hard-links segment files on the same nodes. To move an index to a different cluster, use snapshot and restore or reindex from remote.

Q: Does cloning an Elasticsearch index duplicate the data on disk?
A: Not initially. Clone hard-links the source's segment files into the target, so both indices share the same underlying bytes. Disk usage grows only as the two diverge through writes, deletes, and merges on either side.

Q: How long does it take to clone a large Elasticsearch index?
A: Usually under a minute even for terabyte-scale indices, because the operation copies metadata and creates hard links rather than re-indexing. The main cost is the allocation step that pulls all source primaries onto one node before linking begins.

Q: Can I change settings or mappings while cloning?
A: You can change most index settings (replicas, refresh interval, allocation rules) and aliases in the clone request. You cannot change number_of_shards, mapping types, or analyzers, because clone preserves the underlying segments. Reindex is the right tool for mapping changes.

Q: Can I clone a closed Elasticsearch index?
A: No. The source must be open. Run POST /<index>/_open first, then set the write block and clone.

Q: Why does clone require setting the index to read-only first?
A: Clone hard-links the source's segment files. If writes were still happening, the segment list would change underneath the operation. The index.blocks.write: true setting freezes the source so the clone produces a consistent point-in-time copy.

Q: What happens to ILM policies during clone?
A: The target inherits the source's settings, including index.lifecycle.name. If you want the clone managed by a different policy (or no policy), override index.lifecycle.name in the clone request body or update it on the target afterward.

Q: What's the best tool to safely clone an Elasticsearch index in production?
A: Pulse is purpose-built for this. It is an AI DBA for Elasticsearch and OpenSearch that pre-checks disk, shard budget, and primary co-location for the clone, tracks the write-block status on the source, and flags forgotten index.blocks.write: true settings that would silently reject application writes after the clone completes.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.