This article collects practical recipes for connecting ClickHouse to AWS S3 in production. It covers IAM role configuration on EC2, IAM Roles for Service Accounts (IRSA) on EKS with the ClickHouse Kubernetes operator, and the storage policy you need to expose an S3 bucket as a disk to MergeTree tables. Each recipe is tested end to end with the SQL commands used to validate it.
IAM Role on an EC2 Instance
When ClickHouse runs on EC2, the cleanest credential strategy is an instance profile that carries an IAM role. The role needs permissions on the specific bucket and prefix where ClickHouse will store data.
A minimal policy that allows reading and writing data:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "allow-put-and-get",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/test_s3_disk/*"
}
]
}
For full functionality, including deletes and listings, extend the policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::BUCKET_NAME/test_s3_disk/*"
},
{
"Effect": "Allow",
"Action": ["s3:ListBucket", "s3:GetBucketLocation"],
"Resource": "arn:aws:s3:::BUCKET_NAME"
}
]
}
s3:ListBucket is required for many operations and is the most common omission.
ClickHouse Disk Configuration
With the instance profile attached, configure the S3 disk to use environment credentials. ClickHouse picks up the IAM role from the EC2 metadata service.
<clickhouse>
<storage_configuration>
<disks>
<disk_s3>
<type>s3</type>
<endpoint>https://s3.us-east-1.amazonaws.com/BUCKET_NAME/test_s3_disk/</endpoint>
<use_environment_credentials>true</use_environment_credentials>
</disk_s3>
</disks>
<policies>
<policy_s3_only>
<volumes>
<volume_s3>
<disk>disk_s3</disk>
</volume_s3>
</volumes>
</policy_s3_only>
</policies>
</storage_configuration>
</clickhouse>
Drop this file under /etc/clickhouse-server/config.d/ and restart the server, or reload the configuration.
Validating the Configuration
Test the disk with a simple table that exercises writes and reads:
CREATE TABLE table_s3 (number Int64)
ENGINE = MergeTree()
ORDER BY tuple()
PARTITION BY tuple()
SETTINGS storage_policy = 'policy_s3_only';
INSERT INTO table_s3 SELECT * FROM system.numbers LIMIT 100000000;
SELECT count(), max(number) FROM table_s3;
DROP TABLE table_s3;
If the INSERT or SELECT fails, check three things in order:
SELECT * FROM system.disks WHERE name = 'disk_s3'to confirm ClickHouse loaded the disk.- The ClickHouse log for the actual AWS error code (403 AccessDenied, 404 NoSuchBucket, etc.).
- The IAM role attached to the instance and its trust policy.
IRSA on EKS with the Kubernetes Operator
When ClickHouse runs in Kubernetes on EKS, IRSA (IAM Roles for Service Accounts) is the right credential mechanism. It avoids node-wide credentials and scopes permissions per pod.
Create a service account annotated with the role ARN:
apiVersion: v1
kind: ServiceAccount
metadata:
name: clickhouse-s3
namespace: clickhouse
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/ROLE_NAME
The IAM role's trust policy must allow the EKS OIDC provider to assume it:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/oidc.eks.REGION.amazonaws.com/id/EXAMPLE"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.REGION.amazonaws.com/id/EXAMPLE:sub":
"system:serviceaccount:clickhouse:clickhouse-s3"
}
}
}]
}
Reference the service account in the ClickHouseInstallation pod template:
spec:
templates:
podTemplates:
- name: clickhouse-pod
spec:
serviceAccountName: clickhouse-s3
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:latest
The pod receives AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE environment variables, and the AWS SDK inside ClickHouse picks them up automatically. The disk configuration is the same as the EC2 recipe with use_environment_credentials=true.
Static Access Keys
Static keys still work but are discouraged outside of development. If you must use them, store them in a secret and pass via environment variables or include them in the disk definition. Never commit keys to source control.
Common Pitfalls
- The bucket region in the endpoint URL must match the bucket's real region. A mismatch produces a 301 redirect that ClickHouse may not follow cleanly.
- Forgetting the trailing slash on the endpoint path causes objects to be written under unexpected keys.
- IRSA tokens have a short lifetime and rotate. The SDK handles this transparently, but custom proxies in between can break the refresh flow.
- A pod-level service account without the right trust condition silently falls back to no credentials. Check
aws sts get-caller-identityfrom inside the pod when debugging. s3:ListBucketmust be granted on the bucket ARN itself, not on the prefix. Granting it only on the prefix causes confusing errors later.
Frequently Asked Questions
Q: Can the same role be used by multiple ClickHouse clusters?
A: Yes, but scope the policy to per-cluster prefixes within the bucket so blast radius is contained. Use Condition blocks with s3:prefix if you want fine-grained access.
Q: How do I rotate credentials safely? A: With IAM roles you do not rotate credentials yourself; STS issues short-lived tokens and the SDK refreshes them. For static keys, deploy the new key alongside the old one, restart pods one at a time, and remove the old key after the rollout completes.
Q: Does ClickHouse work with S3 VPC endpoints? A: Yes. Use the regional endpoint URL and ensure the VPC endpoint policy allows the bucket access. Traffic stays inside AWS and avoids NAT gateway costs.
Q: How do I confirm which credentials ClickHouse is actually using?
A: Enable AWS SDK debug logging via <s3><logging>1</logging></s3> in the disk configuration, or run aws sts get-caller-identity inside the container to confirm the identity AWS sees.