Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Read more

Elasticsearch Snapshot GCS Repository Errors

Google Cloud Storage (GCS) is a common target for Elasticsearch snapshot repositories, but misconfiguration at any layer - plugin, credentials, IAM, or bucket - can block snapshot operations entirely. This article covers the errors you are most likely to hit and how to resolve them.

Plugin Installation

The repository-gcs plugin is not bundled with Elasticsearch by default. Every data and master-eligible node in the cluster must have the plugin installed, and the node must be restarted afterward. Install it with:

bin/elasticsearch-plugin install repository-gcs

If any node is missing the plugin, registering the repository will fail with an error like:

repository_verification_exception: [my_gcs_repo] path is not accessible on master node

On Kubernetes or Docker deployments, the plugin must be baked into the image or installed via an init container. A common mistake is installing the plugin on only the coordinating node and skipping data nodes. Elasticsearch verifies repository access from every node that may hold shard data, so a partial installation will still fail verification.

After installation, confirm the plugin is loaded by checking GET _cat/plugins?v on each node. If the plugin version does not match the Elasticsearch version exactly, the node will refuse to start.

Service Account Credentials and Keystore Setup

GCS authentication requires a service account JSON key file stored in the Elasticsearch keystore. Add it with:

bin/elasticsearch-keystore add-file gcs.client.default.credentials_file /path/to/service-account.json

Replace default with your named client if you use a custom client configuration. After adding or updating the keystore entry, either restart the node or call POST _nodes/reload_secure_settings to pick up the change without downtime.

Common credential errors include:

  • Missing keystore entry: Elasticsearch falls back to Application Default Credentials (ADC). In environments without ADC (most self-managed clusters), this results in a repository_verification_exception with com.google.cloud.storage.StorageException: 401 Unauthorized.
  • Wrong client name: If the repository references client: "my_client" but the keystore only contains gcs.client.default.credentials_file, Elasticsearch cannot locate the credentials and throws a verification failure.
  • Stale credentials: Rotating the service account key without updating the keystore on all nodes leaves some nodes unable to authenticate. The snapshot may start but fail mid-way when a shard is assigned to a node with outdated credentials.

The JSON key file must contain type, project_id, private_key_id, private_key, and client_email fields. A truncated or corrupted file produces IOException: Error reading credential file.

IAM Permissions

The service account needs a specific set of GCS IAM permissions. At minimum, grant these:

  • storage.objects.create - write snapshot blobs
  • storage.objects.get - read snapshot data during restore
  • storage.objects.delete - delete old snapshots
  • storage.objects.list - enumerate repository contents
  • storage.buckets.get - verify the bucket exists and read its metadata

The predefined role roles/storage.objectAdmin covers the object-level permissions. You still need storage.buckets.get separately, which is included in roles/storage.legacyBucketReader. Alternatively, create a custom IAM role with exactly these five permissions to follow least-privilege principles.

When permissions are missing, you will typically see:

repository_verification_exception: [my_gcs_repo] path [...] is not accessible
  Caused by: com.google.cloud.storage.StorageException: <caller> does not have storage.objects.create access to the Google Cloud Storage object

The error message specifies which permission is missing. IAM policy changes can take up to 60 seconds to propagate, so wait before retesting after a permission update.

Bucket Configuration Issues

The GCS bucket must exist before you register the repository. Elasticsearch will not create it. If the bucket is missing:

repository_verification_exception: [my_gcs_repo] [...] bucket [my-bucket] does not exist

Check the bucket name for typos. GCS bucket names are globally unique and case-sensitive in practice (though they must be lowercase). Also verify the bucket is in the same GCP project that the service account belongs to, or that cross-project access is explicitly granted.

Bucket-level settings can also cause problems. If the bucket uses a retention policy with a locked retention period, Elasticsearch cannot delete old snapshots, causing 412 Precondition Failed errors. Object versioning does not interfere with snapshots but increases storage costs because deleted blobs remain as noncurrent versions. Uniform bucket-level access (the default for newer buckets) works fine with the repository-gcs plugin - just make sure IAM permissions are set at the bucket level rather than relying on legacy ACLs.

Timeout Errors and Troubleshooting

Large indices can trigger timeout errors during snapshot creation or restore. The GCS client in Elasticsearch has default HTTP timeouts that may not be sufficient for multi-hundred-gigabyte shards.

Symptoms include SocketTimeoutException: Read timed out or partial snapshot failures where only some shards complete. To address this, adjust repository settings when registering or updating the repo:

PUT _snapshot/my_gcs_repo
{
  "type": "gcs",
  "settings": {
    "bucket": "my-bucket",
    "client": "default",
    "chunk_size": "100mb",
    "max_restore_bytes_per_sec": "200mb",
    "max_snapshot_bytes_per_sec": "200mb"
  }
}

The chunk_size setting breaks large blobs into smaller pieces, reducing the chance of a single HTTP request timing out. The default max_snapshot_bytes_per_sec is 40 MB/s. Raising it helps on high-bandwidth networks but can saturate disk I/O on the node if set too high. For client-level timeout tuning, set gcs.client.default.connect_timeout and gcs.client.default.read_timeout in elasticsearch.yml (these require a node restart). If snapshots fail intermittently on specific shards, check whether those shards reside on nodes with higher network latency to GCS.

When a GCS snapshot repository fails, work through this sequence:

  1. Verify plugin installation on every master-eligible and data node: GET _cat/plugins?v.
  2. Check keystore entries with bin/elasticsearch-keystore list - confirm gcs.client.<name>.credentials_file is present.
  3. Test IAM permissions outside Elasticsearch using gsutil ls gs://my-bucket/ with the same service account to isolate whether the issue is GCS-side or Elasticsearch-side.
  4. Inspect the full error in Elasticsearch logs, not just the REST API response. The root cause is often nested several levels deep in the exception chain.
  5. Verify all nodes can reach GCS - firewall rules, VPC Service Controls, and Private Google Access settings can silently block traffic from some nodes.
  6. Reload secure settings if you recently changed the keystore: POST _nodes/reload_secure_settings.
  7. Try the verify parameter when registering the repo: PUT _snapshot/my_gcs_repo?verify=true forces an immediate write/read/delete test on all nodes.
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.