Elasticsearch Snapshot Azure Repository Errors

Azure Blob Storage is a widely used backend for Elasticsearch snapshot repositories, but authentication, permission, and network configuration issues can prevent snapshots from working. This article walks through the common failure modes and how to fix them.

Plugin Installation and Basic Setup

The repository-azure plugin ships separately from Elasticsearch. Install it on every master-eligible and data node:

bin/elasticsearch-plugin install repository-azure

Restart each node after installation. If any node in the cluster lacks the plugin, repository registration fails with:

repository_verification_exception: [my_azure_repo] path is not accessible on master node

After the plugin is loaded, configure the Azure client. At minimum, the storage account name must be added to the Elasticsearch keystore:

bin/elasticsearch-keystore add azure.client.default.account

Then register the repository, specifying the container name:

PUT _snapshot/my_azure_repo
{
  "type": "azure",
  "settings": {
    "container": "elasticsearch-snapshots",
    "base_path": "snapshots"
  }
}

The container must already exist in the storage account. Elasticsearch will not create it. A missing container produces a BlobStorageException with HTTP status 404 and ContainerNotFound in the error body.

Authentication: Access Key vs SAS Token

The plugin supports two explicit credential types stored in the keystore.

Access Key - the full storage account key:

bin/elasticsearch-keystore add azure.client.default.key

SAS Token - a scoped shared access signature:

bin/elasticsearch-keystore add azure.client.default.sas_token

Do not set both key and sas_token for the same client. The plugin will reject the configuration. If neither is set, the plugin falls back to environment-based credentials (managed identity or workload identity).

Common authentication errors:

AuthenticationException: The key or SAS token is malformed, expired, or does not match the storage account. Double-check the value with az storage account keys list or regenerate the SAS token. SAS tokens must start with sv= (the signed version parameter) when pasted into the keystore.
BlobStorageException: Server failed to authenticate the request: The Authorization header signature does not match. This typically means the access key was rotated in Azure but not updated in the Elasticsearch keystore. Update the key and reload secure settings: POST _nodes/reload_secure_settings.
Expired SAS token: SAS tokens have an explicit expiry. When the token expires mid-snapshot, partial failures occur. Set a long-lived token or automate rotation.

For SAS tokens, the token must include these permissions: read (r), write (w), list (l), and delete (d). Missing any of these results in AuthorizationFailure with HTTP 403 on the specific operation that lacks permission.

Azure Permissions and RBAC

When using managed identity or Azure AD-based authentication, the identity needs the Storage Blob Data Contributor role on the storage account or the specific container. This role grants read, write, and delete access to blob data.

Assigning only Reader or Storage Account Contributor is a common mistake - those roles manage the storage account resource itself but do not authorize data-plane operations like reading or writing blobs.

Role assignment propagation can take up to 30 minutes in Azure. If you just assigned the role, wait before testing. You can verify effective permissions using the Azure CLI:

az storage blob list --account-name mystorageaccount --container-name elasticsearch-snapshots --auth-mode login

If this command succeeds but Elasticsearch still fails, the issue is on the Elasticsearch side (wrong client configuration, missing plugin reload, etc.). If the CLI also fails, the Azure RBAC assignment is not yet effective or is scoped incorrectly.

Managed Identity and Workload Identity

When running on Azure VMs, managed identity is the preferred authentication method. If you do not set key or sas_token in the keystore, the plugin uses the Azure SDK's DefaultAzureCredential chain, which includes managed identity.

For this to work:

Assign a system-assigned or user-assigned managed identity to the VM (or VM Scale Set).
Grant the identity Storage Blob Data Contributor on the target storage account.
Do not set azure.client.default.key or azure.client.default.sas_token in the keystore.

If the identity is not available, Elasticsearch logs a CredentialUnavailableException. This happens when managed identity is not enabled on the VM, when the Azure Instance Metadata Service (IMDS) endpoint at 169.254.169.254 is unreachable, or when the VM's network security group blocks outbound traffic to the metadata endpoint.

For Azure Kubernetes Service (AKS), use workload identity. Mount the azure-identity-token volume as a subdirectory of the Elasticsearch config directory and set the AZURE_FEDERATED_TOKEN_FILE environment variable to point to the projected token file. Misconfiguring the volume mount path is a frequent source of CredentialUnavailableException in containerized deployments.

Proxy Configuration, Timeouts, and Large Snapshots

In networks that route traffic through a proxy, configure the Azure client proxy settings in elasticsearch.yml:

azure.client.default.proxy.type: http
azure.client.default.proxy.host: proxy.internal.example.com
azure.client.default.proxy.port: 3128

Supported proxy types are http, socks, and direct (the default). If proxy.type is set to http or socks but proxy.host or proxy.port is missing, the node fails to start with a settings validation error. Proxy-related failures are often silent at the repository level. Elasticsearch may report a generic repository_verification_exception while the underlying cause is a connection timeout through the proxy. Check Elasticsearch logs at DEBUG level for the repository_azure logger to see the actual HTTP-level error. Corporate proxies that perform TLS inspection can also cause certificate validation failures - you may need to import the proxy's CA certificate into the JVM truststore.

Large snapshots can exceed default HTTP timeouts. Symptoms include partial snapshot failures, BlobStorageException with connection reset or timeout messages, and snapshots that remain in IN_PROGRESS state indefinitely.

Tune these settings to handle large indices:

Client-level timeout in elasticsearch.yml (requires node restart):

azure.client.default.timeout: 300s
azure.client.default.max_retries: 5

Repository-level settings when registering the repo:

PUT _snapshot/my_azure_repo
{
  "type": "azure",
  "settings": {
    "container": "elasticsearch-snapshots",
    "chunk_size": "64mb",
    "max_snapshot_bytes_per_sec": "200mb",
    "max_restore_bytes_per_sec": "200mb"
  }
}

The chunk_size parameter splits large blobs into smaller uploads, reducing the chance of a single request timing out. Azure Blob Storage has a 5000-block limit per blob when using block uploads, so do not set chunk_size too small for very large shards.

The endpoint_suffix setting defaults to core.windows.net. For Azure Government (core.usgovcloudapi.net) or Azure China (core.chinacloudapi.cn), set this explicitly or uploads will fail with DNS resolution errors.

Network throttling on the Azure side can also cause timeouts. Azure storage accounts have per-account ingress and egress limits. If multiple clusters or applications share the same storage account, snapshot throughput competes with other traffic. Dedicated storage accounts for Elasticsearch snapshots avoid this contention.