Elasticsearch .security Index Unavailable

The .security-7 index (aliased as .security) stores all native realm users, roles, role mappings, API keys, and service account tokens. When this index becomes unavailable, every authentication request that depends on the native or reserved realms fails. The cluster itself may be healthy - green shards, data flowing, indices accessible - but no user can log in. Kibana shows a login page that rejects every credential. API calls return security_exception with the message "security index is unavailable."

Why the .security Index Goes Down

The most common cause is a red cluster state where the primary shard for .security-7 cannot be allocated. This happens when the node holding that shard goes offline and no replica exists - the default configuration for a single-node cluster gives .security-7 zero replicas. Disk watermark breaches also cause shard allocation failures; if every eligible node exceeds the cluster.routing.allocation.disk.watermark.flood_stage threshold (95% by default), Elasticsearch blocks index writes and can leave shards unassigned after a restart.

Corruption is less frequent but harder to recover from. A hard crash or storage-layer failure can leave Lucene segments in an inconsistent state. Elasticsearch logs will show CorruptIndexException or IndexFormatTooOldException for the .security-7 index. In rare cases, accidental deletion through a wildcard delete pattern that matches hidden indices (the .security-7 index is a system index, but older versions did not always protect it from broad DELETE /.s* operations) wipes the index entirely.

Version upgrades can also trigger problems. The .security index was reindexed from version 6 format to version 7 format during the 7.x upgrade path. If that migration was interrupted or if an old index template (security-index-template-v6) lingered in the cluster state, the index can end up in a broken state that blocks further operations.

Immediate Impact

When .security-7 is unavailable, all native realm and reserved user authentication stops. The elastic, kibana_system, logstash_system, beats_system, and remote_monitoring_user accounts all live in this index. API keys stored there stop validating. Role definitions vanish from the perspective of the authorization layer, so even tokens issued by external realms (SAML, OIDC) that map to native roles lose their permissions.

Requests authenticated through the file realm or PKI certificates with role mappings defined in role_mapping.yml continue to work. This is the escape hatch - the file realm does not depend on any index. It reads from flat files on disk (users and users_roles in the Elasticsearch config directory) and is evaluated before the native realm in the default realm chain.

Emergency Access via the File Realm

The file realm is always available regardless of cluster state. If you have not pre-configured file realm users, you can add one on any node:

bin/elasticsearch-users useradd admin_recovery -p changeme_now -r superuser

This writes to ES_PATH_CONF/users and ES_PATH_CONF/users_roles on that specific node. The file realm is local to each node - it is not replicated across the cluster. You only need it on the node you are connecting to for recovery purposes. After creating the user, you can authenticate and run diagnostic commands:

curl -u admin_recovery:changeme_now https://localhost:9200/_cluster/health?pretty
curl -u admin_recovery:changeme_now https://localhost:9200/_cat/shards/.security-7?v

The elasticsearch-reset-password tool is another recovery mechanism. It works by temporarily creating a file-realm superuser behind the scenes, using that user to call the change-password API, then cleaning up. Run it against the built-in elastic user:

bin/elasticsearch-reset-password -u elastic --auto

This only works if the .security-7 index is available enough for write operations. If the index is completely missing or its primary shard is unallocated, the tool will fail with an error indicating it cannot reach the security index. In that case, fall back to the file realm approach.

Recovering the Index

The recovery path depends on the failure mode. For an unassigned primary shard, first check allocation status:

GET /_cluster/allocation/explain
{
  "index": ".security-7",
  "shard": 0,
  "primary": true
}

If the shard is unassigned due to a node being temporarily offline, bringing that node back is the simplest fix. If the node is permanently gone and no replica exists, you have two options: accept data loss by allocating a stale or empty primary shard, or restore from a snapshot.

To force allocation of an empty primary (this destroys existing data in that shard):

POST /_cluster/reroute
{
  "commands": [{
    "allocate_empty_primary": {
      "index": ".security-7",
      "shard": 0,
      "node": "surviving-node-name",
      "accept_data_loss": true
    }
  }]
}

After allocating an empty primary, the index exists but is empty. All users, roles, and API keys are gone. You will need to recreate them or restore from a snapshot.

Restoring from a snapshot is the preferred path when one is available. Close the existing index first (you need the file realm user for this since native auth is broken), then restore:

POST /.security-7/_close
POST /_snapshot/my_repo/my_snapshot/_restore
{
  "indices": ".security-7",
  "include_global_state": false
}

After the restore completes, open the index and verify authentication works. If the snapshot is from a significantly older state, some recently created users or roles will be missing.

Bootstrapping a Fresh Security Index

When recovery is not possible - no snapshot, no surviving shard, corrupted beyond repair - the cleanest path is letting Elasticsearch recreate the .security-7 index from scratch. Delete the broken index (using your file-realm superuser):

curl -u admin_recovery:changeme_now -X DELETE \
  "https://localhost:9200/.security-7"

On the next security-related operation, Elasticsearch auto-creates a new .security-7 index with default mappings. The built-in users (elastic, kibana_system, etc.) get recreated with randomized passwords. Use elasticsearch-reset-password for each built-in account to set known passwords:

bin/elasticsearch-reset-password -u elastic --auto
bin/elasticsearch-reset-password -u kibana_system --auto

Then recreate your custom users, roles, and role mappings through the API or reload them from whatever configuration management you use. This is a full reset - treat it as rebuilding the security layer from scratch. Update the kibana_system password in kibana.yml and restart Kibana. Do the same for any Beats or Logstash instances using built-in accounts.

To prevent this situation in the future, set index.number_of_replicas to at least 1 on .security-7 for any cluster with more than one node. Include .security-7 in your snapshot lifecycle policy. Pre-configure at least one file-realm superuser on every node before you need one - the worst time to learn about the file realm is during an outage.