Learn how to deploy, scale, and manage OpenSearch clusters on Kubernetes using the official OpenSearch Kubernetes Operator. This comprehensive guide covers installation, configuration, security, monitoring, and production best practices for running OpenSearch workloads in Kubernetes environments.
What is OpenSearch Kubernetes Operator?
The OpenSearch Kubernetes Operator is a powerful tool that automates the deployment, provisioning, management, and orchestration of OpenSearch clusters and OpenSearch Dashboards on Kubernetes. Built for cloud-native environments, it simplifies complex operations like scaling, version upgrades, security configuration, and cluster management.
Prerequisites
Before installing the OpenSearch Kubernetes Operator, ensure your environment meets these requirements:
Kubernetes Environment
- Kubernetes Version: v1.19 or higher
- Cluster Access:
kubectl
configured with admin privileges - Node Resources: Minimum 4 CPU cores and 8GB RAM available across cluster nodes
- Storage: Dynamic persistent volume provisioner configured (recommended)
Required Tools
- Helm: Version 3.x for package management
- curl: For API testing and health checks
- jq: For JSON parsing (optional but recommended)
Network Requirements
- Pod Network: Cluster networking properly configured
- Service Access: LoadBalancer or NodePort support for external access
- DNS: CoreDNS or equivalent for service discovery
Compatibility Matrix
The OpenSearch Kubernetes Operator supports multiple OpenSearch versions:
Operator Version | Min OpenSearch Version | Max OpenSearch Version | Kubernetes Version |
---|---|---|---|
2.8.0 | 2.19.2 | latest 3.x | 1.19+ |
2.7.0 | 1.3.x | 2.19.2 | 1.19+ |
Installation Guide
Step 1: Add the Helm Repository
```bshsf'''f'''
helm repo add opensearch-operator https://opensearch-project.github.io/opensearch-k8s-operator/
helm repo update
### Step 2: Install the Operator
```bash
helm install opensearch-operator opensearch-operator/opensearch-operator
Verify the operator is running:
kubectl get pods -l app.kubernetes.io/name=opensearch-operator
Deploy Your First OpenSearch Cluster
Step 3: Create a Basic OpenSearch Cluster
Create a file named my-opensearch-cluster.yaml
:
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
name: my-first-cluster
namespace: default
spec:
security:
config:
tls:
http:
generate: true
transport:
generate: true
perNode: true
general:
httpPort: 9200
serviceName: my-first-cluster
version: 2.14.0
pluginsList: ["repository-s3"]
drainDataNodes: true
dashboards:
tls:
enable: true
generate: true
version: 2.14.0
enable: true
replicas: 1
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "512Mi"
cpu: "200m"
nodePools:
- component: masters
replicas: 3
resources:
requests:
memory: "4Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "1000m"
roles:
- "data"
- "cluster_manager"
persistence:
emptyDir: {}
Step 4: Deploy the Cluster
kubectl apply -f my-opensearch-cluster.yaml
Step 5: Monitor Deployment
Check the status of your cluster:
kubectl get opensearchclusters
kubectl get pods -l opster.io/opensearch-cluster=my-first-cluster
Wait for all pods to be in Running
state.
Access and Security
Authentication and Credentials
The OpenSearch Kubernetes Operator automatically configures security features including authentication, authorization, and TLS encryption.
Retrieve Admin Credentials
The operator generates secure admin credentials automatically:
# Get admin password
ADMIN_PASSWORD=$(kubectl get secret my-first-cluster-admin-password -o jsonpath='a{.data.password}' | base64 -d)
echo "Admin password: $ADMIN_PASSWORD"
# Default admin username is 'admin'
ADMIN_USER="admin"
Create Additional Users
You can create custom users by configuring security settings:
spec:
security:
config:
securityConfigSecret:
name: opensearch-security-config
Network Access Options
Option 1: Port Forwarding (Development)
For local development and testing:
# Access OpenSearch API
kubectl port-forward svc/my-first-cluster 9200:9200
# Access OpenSearch Dashboards
kubectl port-forward svc/my-first-cluster-dashboards 5601:5601
Option 2: LoadBalancer Service (Production)
For production environments, expose services via LoadBalancer:
spec:
general:
serviceType: LoadBalancer
dashboards:
service:
type: LoadBalancer
Option 3: Ingress Controller
Configure ingress for domain-based access:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: opensearch-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
tls:
- hosts:
- opensearch.example.com
secretName: opensearch-tls
rules:
- host: opensearch.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-first-cluster
port:
number: 9200
API Access and Testing
Basic Health Check
curl -k -u admin:$ADMIN_PASSWORD https://localhost:9200/_cluster/health?pretty
Advanced API Operations
# Check cluster nodes
curl -k -u admin:$ADMIN_PASSWORD https://localhost:9200/_cat/nodes?v
# List indices
curl -k -u admin:$ADMIN_PASSWORD https://localhost:9200/_cat/indices?v
# Create a test index
curl -k -u admin:$ADMIN_PASSWORD -X PUT https://localhost:9200/test-index \
-H 'Content-Type: application/json' \
-d '{"settings": {"number_of_shards": 1, "number_of_replicas": 1}}'
TLS Certificate Management
The operator automatically manages TLS certificates for secure communication:
spec:
security:
tls:
http:
generate: true # Auto-generate HTTP certificates
secret:
name: "" # Optional: Use existing certificate
transport:
generate: true # Auto-generate transport certificates
perNode: true # Generate per-node certificates
Advanced Configuration
Multi-Node Pool Architecture
Configure different node pools for optimal performance and resource utilization:
spec:
nodePools:
# Master-eligible nodes
- component: masters
replicas: 3
roles: ["cluster_manager"]
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "2Gi"
cpu: "1000m"
persistence:
pvc:
storageClass: "fast-ssd"
size: "20Gi"
# Dedicated data nodes
- component: data-nodes
replicas: 4
roles: ["data"]
resources:
requests:
memory: "16Gi"
cpu: "4000m"
limits:
memory: "16Gi"
cpu: "4000m"
persistence:
pvc:
storageClass: "high-iops"
size: "500Gi"
# Coordinating nodes for query handling
- component: coordinators
replicas: 2
roles: ["coordinating_only"]
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "2000m"
Hot-Warm-Cold Architecture
Implement tiered storage for cost optimization:
spec:
nodePools:
# Hot nodes - latest data, high performance
- component: hot-nodes
replicas: 3
roles: ["data", "data_hot"]
resources:
requests:
memory: "32Gi"
cpu: "8000m"
persistence:
pvc:
storageClass: "nvme-ssd"
size: "1Ti"
jvm: "-Xms16g -Xmx16g"
# Warm nodes - older frequently accessed data
- component: warm-nodes
replicas: 4
roles: ["data", "data_warm"]
resources:
requests:
memory: "16Gi"
cpu: "4000m"
persistence:
pvc:
storageClass: "premium-ssd"
size: "2Ti"
jvm: "-Xms8g -Xmx8g"
# Cold nodes - archive data, cost-optimized
- component: cold-nodes
replicas: 2
roles: ["data", "data_cold"]
resources:
requests:
memory: "8Gi"
cpu: "2000m"
persistence:
pvc:
storageClass: "standard"
size: "5Ti"
jvm: "-Xms4g -Xmx4g"
Custom OpenSearch Configuration
Configure OpenSearch settings for your use case:
spec:
general:
version: "2.14.0"
pluginsList:
- "repository-s3"
- "repository-azure"
- "ingest-attachment"
- "analysis-icu"
additionalConfig:
opensearch.yml: |
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%
indices.recovery.max_bytes_per_sec: 100mb
indices.memory.index_buffer_size: 20%
thread_pool.search.queue_size: 10000
thread_pool.write.queue_size: 10000
Resource Management and Scaling
Horizontal Pod Autoscaling
Enable automatic scaling based on metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: opensearch-data-hpa
spec:
scaleTargetRef:
apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
name: my-first-cluster
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Vertical Scaling
Update node resources:
# Scale up data node resources
kubectl patch opensearchcluster my-first-cluster --type='merge' -p='
{
"spec": {
"nodePools": [{
"component": "data-nodes",
"resources": {
"requests": {
"memory": "32Gi",
"cpu": "8000m"
},
"limits": {
"memory": "32Gi",
"cpu": "8000m"
}
}
}]
}
}'
Storage Scaling
Increase persistent volume sizes:
spec:
nodePools:
- component: data-nodes
persistence:
pvc:
size: "1Ti" # Increased from 500Gi
# Note: Requires storage class that supports volume expansion
Version Management and Rolling Updates
Controlled Rolling Updates
Configure update strategy:
spec:
general:
version: "2.15.0"
drainDataNodes: true
updateStrategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
maxSurge: 0
Blue-Green Deployment
For zero-downtime major updates:
# Create new cluster with updated version
kubectl apply -f opensearch-cluster-v2.yaml
# Migrate data using reindex API
curl -X POST "https://old-cluster:9200/_reindex" \
-H 'Content-Type: application/json' \
-d '{
"source": {
"remote": {
"host": "https://old-cluster:9200"
},
"index": "source-index"
},
"dest": {
"index": "dest-index"
}
}'
# Switch traffic to new cluster
# Delete old cluster after verification
Production Considerations
High Availability and Resilience
Multi-Zone Deployment
Distribute nodes across availability zones for fault tolerance:
spec:
nodePools:
- component: masters
replicas: 3
nodeSelector:
topology.kubernetes.io/zone: us-west-2a
- component: masters-zone-b
replicas: 3
nodeSelector:
topology.kubernetes.io/zone: us-west-2b
- component: masters-zone-c
replicas: 3
nodeSelector:
topology.kubernetes.io/zone: us-west-2c
Pod Disruption Budgets
Ensure cluster availability during maintenance:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: opensearch-pdb
spec:
minAvailable: 2
selector:
matchLabels:
opensearch.role: master
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: opensearch-data-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
opensearch.role: data
Resource Quotas and Limits
Set cluster-wide resource constraints:
apiVersion: v1
kind: ResourceQuota
metadata:
name: opensearch-quota
spec:
hard:
requests.cpu: "50"
requests.memory: 200Gi
limits.cpu: "100"
limits.memory: 400Gi
persistentvolumeclaims: "20"
Security Hardening
Network Policies
Implement network segmentation:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: opensearch-network-policy
spec:
podSelector:
matchLabels:
app: opensearch
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: opensearch
- podSelector:
matchLabels:
app: kibana
ports:
- protocol: TCP
port: 9200
- protocol: TCP
port: 9300
egress:
- to:
- podSelector:
matchLabels:
app: opensearch
ports:
- protocol: TCP
port: 9300
RBAC Configuration
Configure Role-Based Access Control:
apiVersion: v1
kind: ServiceAccount
metadata:
name: opensearch-operator
namespace: opensearch-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: opensearch-operator
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: opensearch-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: opensearch-operator
subjects:
- kind: ServiceAccount
name: opensearch-operator
namespace: opensearch-system
Monitoring and Observability
Prometheus Integration
Enable metrics collection:
spec:
monitoring:
enable: true
scrapeInterval: 30s
labels:
release: prometheus
general:
additionalConfig:
opensearch.yml: |
prometheus.metrics.enabled: true
prometheus.indices: true
prometheus.cluster.settings: true
Grafana Dashboard
Import pre-built dashboards for visualization:
# Download official OpenSearch Grafana dashboard
curl -o opensearch-dashboard.json \
https://raw.githubusercontent.com/opensearch-project/opensearch-k8s-operator/main/grafana/opensearch-cluster-dashboard.json
# Import to Grafana
kubectl create configmap opensearch-dashboard \
--from-file=opensearch-dashboard.json \
-n monitoring
Log Management
Configure centralized logging:
spec:
general:
additionalConfig:
log4j2.properties: |
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n
appender.json.type = Console
appender.json.name = json
appender.json.layout.type = ESJsonLayout
appender.json.layout.type_name = server
rootLogger.level = info
rootLogger.appenderRef.console.ref = console
rootLogger.appenderRef.json.ref = json
Health Checks and Alerts
Configure monitoring alerts:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: opensearch-alerts
spec:
groups:
- name: opensearch.rules
rules:
- alert: OpenSearchClusterRed
expr: opensearch_cluster_status{color="red"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "OpenSearch cluster status is RED"
description: "Cluster is in RED state"
- alert: OpenSearchNodeDown
expr: up{job="opensearch"} == 0
for: 2m
labels:
severity: warning
annotations:
summary: "OpenSearch node is down"
description: "Node has been down for more than 2 minutes"
- alert: OpenSearchDiskSpaceHigh
expr: opensearch_filesystem_data_used_percent > 85
for: 5m
labels:
severity: warning
annotations:
summary: "OpenSearch disk space usage high"
description: "Disk usage on is above 85%"
Best Practices
Performance Optimization
JVM Tuning
Optimize Java Virtual Machine settings:
spec:
nodePools:
- component: data-nodes
jvm: |
-Xms16g
-Xmx16g
-XX:+UseG1GC
-XX:G1HeapRegionSize=32m
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+UnlockExperimentalVMOptions
-XX:+UseTransparentHugePages
-XX:+AlwaysPreTouch
Index Templates and Policies
Configure index lifecycle management:
# Create index template
curl -X PUT "https://localhost:9200/_index_template/logs_template" \
-H 'Content-Type: application/json' \
-d '{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"index.refresh_interval": "30s",
"index.codec": "best_compression"
},
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "text",
"analyzer": "standard"
}
}
}
}
}'
# Create ISM policy for log rotation
curl -X PUT "https://localhost:9200/_plugins/_ism/policies/logs_policy" \
-H 'Content-Type: application/json' \
-d '{
"policy": {
"description": "Log rotation policy",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [],
"transitions": [
{
"state_name": "warm",
"conditions": {
"min_index_age": "7d"
}
}
]
},
{
"name": "warm",
"actions": [
{
"replica_count": {
"number_of_replicas": 0
}
}
],
"transitions": [
{
"state_name": "delete",
"conditions": {
"min_index_age": "30d"
}
}
]
},
{
"name": "delete",
"actions": [
{
"delete": {}
}
]
}
]
}
}'
Backup and Disaster Recovery
Snapshot Configuration
Configure automated backups:
spec:
general:
additionalConfig:
opensearch.yml: |
path.repo: ["/usr/share/opensearch/snapshots"]
repositories.s3.bucket: "opensearch-backups"
repositories.s3.region: "us-west-2"
Backup Script
#!/bin/bash
# Automated backup script
CLUSTER_URL="https://localhost:9200"
ADMIN_USER="admin"
ADMIN_PASS="$OPENSEARCH_ADMIN_PASSWORD"
REPO_NAME="s3_repository"
SNAPSHOT_NAME="snapshot_$(date +%Y%m%d_%H%M%S)"
# Create repository if not exists
curl -k -u ${ADMIN_USER}:${ADMIN_PASS} -X PUT "${CLUSTER_URL}/_snapshot/${REPO_NAME}" \
-H 'Content-Type: application/json' \
-d '{
"type": "s3",
"settings": {
"bucket": "opensearch-backups",
"region": "us-west-2",
"base_path": "snapshots"
}
}'
# Create snapshot
curl -k -u ${ADMIN_USER}:${ADMIN_PASS} -X PUT "${CLUSTER_URL}/_snapshot/${REPO_NAME}/${SNAPSHOT_NAME}" \
-H 'Content-Type: application/json' \
-d '{
"indices": "*",
"ignore_unavailable": true,
"include_global_state": false
}'
echo "Snapshot ${SNAPSHOT_NAME} created successfully"
FAQ
General Questions
Q: What's the difference between OpenSearch and Elasticsearch? A: OpenSearch is an open-source fork of Elasticsearch, created after Elastic changed its license. It maintains API compatibility while being fully open-source under Apache 2.0 license.
Q: Can I migrate from Elasticsearch to OpenSearch? A: Yes, OpenSearch maintains API compatibility with Elasticsearch versions up to 7.10. Migration typically involves updating client configurations and reindexing data.
Q: What Kubernetes versions are supported? A: The operator supports Kubernetes 1.19 and later versions. It's tested on major cloud platforms including AWS EKS, Google GKE, and Azure AKS.
Operational Questions
Q: How do I handle node failures? A: The operator automatically detects and replaces failed nodes. Ensure you have proper replica settings and the cluster will redistribute data automatically.
Q: Can I run multiple OpenSearch clusters in the same namespace? A: Yes, you can run multiple clusters in the same namespace by giving them unique names. Each cluster is isolated and managed independently.
Q: How do I update the operator itself?
A: Update the operator using Helm: helm upgrade opensearch-operator opensearch-operator/opensearch-operator
Performance Questions
Q: What are the recommended resource requirements? A: For production:
- Master nodes: 2-4 CPU cores, 4-8GB RAM
- Data nodes: 4-8 CPU cores, 16-64GB RAM
- Storage: High-IOPS SSDs recommended
Q: How do I optimize for search performance? A: Use dedicated coordinating nodes, optimize index mappings, implement proper sharding strategy, and consider hot-warm-cold architecture for time-series data.
Q: What's the maximum cluster size supported? A: OpenSearch clusters can scale to hundreds of nodes. The operator has been tested with clusters up to 100 nodes, but larger deployments are possible with proper planning.
Troubleshooting
Common Issues and Solutions
Pod Startup Issues
Problem: Pods stuck in Pending
state
# Check node resources
kubectl describe nodes
kubectl top nodes
# Check pod events
kubectl describe pod <pod-name>
# Check resource quotas
kubectl describe resourcequota
Solution: Ensure sufficient node resources or adjust resource requests.
Cluster Formation Issues
Problem: Nodes not joining cluster
# Check cluster logs
kubectl logs -l opensearch.role=master
# Verify network connectivity
kubectl exec -it <pod-name> -- curl -k https://<other-pod-ip>:9200
Solution: Verify network policies and service discovery configuration.
Storage Issues
Problem: Persistent volume claim failures
# Check PVC status
kubectl get pvc
kubectl describe pvc <pvc-name>
# Check storage class
kubectl get storageclass
Solution: Ensure storage class supports dynamic provisioning and has sufficient capacity.
Debug Commands
# Get all OpenSearch resources
kubectl get opensearchclusters,pods,services,pvc -l app=opensearch
# Check operator logs
kubectl logs -l app.kubernetes.io/name=opensearch-operator -f
# Exec into OpenSearch pod
kubectl exec -it <pod-name> -- /bin/bash
# Port forward for direct access
kubectl port-forward svc/<service-name> 9200:9200
# Get cluster configuration
curl -k -u admin:$PASSWORD https://localhost:9200/_cluster/settings?pretty
# Check allocation explain
curl -k -u admin:$PASSWORD https://localhost:9200/_cluster/allocation/explain?pretty
Cleanup and Uninstallation
Remove OpenSearch Cluster
# Delete cluster (data will be preserved in PVCs)
kubectl delete opensearchcluster my-first-cluster
# Delete PVCs to remove data permanently
kubectl delete pvc -l opensearch.cluster=my-first-cluster
Uninstall Operator
# Remove operator
helm uninstall opensearch-operator
# Clean up CRDs (optional)
kubectl delete crd opensearchclusters.opensearch.opster.io
# Remove operator namespace
kubectl delete namespace opensearch-operator-system