OpenSearch Kubernetes Operator Quickstart: Deploy and Manage OpenSearch Clusters on Kubernetes

Learn how to deploy, scale, and manage OpenSearch clusters on Kubernetes using the official OpenSearch Kubernetes Operator. This comprehensive guide covers installation, configuration, security, monitoring, and production best practices for running OpenSearch workloads in Kubernetes environments.

Watch the complete video series: Deploying OpenSearch on Kubernetes with the Operator

What is OpenSearch Kubernetes Operator?

The OpenSearch Kubernetes Operator is a powerful tool that automates the deployment, provisioning, management, and orchestration of OpenSearch clusters and OpenSearch Dashboards on Kubernetes. Built for cloud-native environments, it simplifies complex operations like scaling, version upgrades, security configuration, and cluster management.

Prerequisites

Before installing the OpenSearch Kubernetes Operator, ensure your environment meets these requirements:

Kubernetes Environment

Kubernetes Version: v1.19 or higher
Cluster Access: kubectl configured with admin privileges
Node Resources: Minimum 4 CPU cores and 8GB RAM available across cluster nodes
Storage: Dynamic persistent volume provisioner configured (recommended)

Required Tools

Helm: Version 3.x for package management
curl: For API testing and health checks
jq: For JSON parsing (optional but recommended)

Network Requirements

Pod Network: Cluster networking properly configured
Service Access: LoadBalancer or NodePort support for external access
DNS: CoreDNS or equivalent for service discovery

Compatibility Matrix

The OpenSearch Kubernetes Operator supports multiple OpenSearch versions:

Operator Version	Min OpenSearch Version	Max OpenSearch Version	Kubernetes Version
2.8.0	2.19.2	latest 3.x	1.19+
2.7.0	1.3.x	2.19.2	1.19+

Installation Guide

Step 1: Add the Helm Repository

```bshsf'''f''' helm repo add opensearch-operator https://opensearch-project.github.io/opensearch-k8s-operator/ helm repo update


### Step 2: Install the Operator

```bash
helm install opensearch-operator opensearch-operator/opensearch-operator

Verify the operator is running:

kubectl get pods -l app.kubernetes.io/name=opensearch-operator

Deploy Your First OpenSearch Cluster

Step 3: Create a Basic OpenSearch Cluster

Create a file named my-opensearch-cluster.yaml:

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-first-cluster
  namespace: default
spec:
  security:
    config:
    tls:
       http:
         generate: true
       transport:
         generate: true
         perNode: true
  general:
    httpPort: 9200
    serviceName: my-first-cluster
    version: 2.14.0
    pluginsList: ["repository-s3"]
    drainDataNodes: true
  dashboards:
    tls:
      enable: true
      generate: true
    version: 2.14.0
    enable: true
    replicas: 1
    resources:
      requests:
         memory: "512Mi"
         cpu: "200m"
      limits:
         memory: "512Mi"
         cpu: "200m"
  nodePools:
    - component: masters
      replicas: 3
      resources:
         requests:
            memory: "4Gi"
            cpu: "1000m"
         limits:
            memory: "4Gi"
            cpu: "1000m"
      roles:
        - "data"
        - "cluster_manager"
      persistence:
         emptyDir: {}

Step 4: Deploy the Cluster

kubectl apply -f my-opensearch-cluster.yaml

Step 5: Monitor Deployment

Check the status of your cluster:

kubectl get opensearchclusters
kubectl get pods -l opster.io/opensearch-cluster=my-first-cluster

Wait for all pods to be in Running state.

Access and Security

Authentication and Credentials

The OpenSearch Kubernetes Operator automatically configures security features including authentication, authorization, and TLS encryption.

Retrieve Admin Credentials

The operator generates secure admin credentials automatically:

# Get admin password
ADMIN_PASSWORD=$(kubectl get secret my-first-cluster-admin-password -o jsonpath='a{.data.password}' | base64 -d)
echo "Admin password: $ADMIN_PASSWORD"

# Default admin username is 'admin'
ADMIN_USER="admin"

Create Additional Users

You can create custom users by configuring security settings:

spec:
  security:
    config:
      securityConfigSecret:
        name: opensearch-security-config

Network Access Options

Option 1: Port Forwarding (Development)

For local development and testing:

# Access OpenSearch API
kubectl port-forward svc/my-first-cluster 9200:9200

# Access OpenSearch Dashboards
kubectl port-forward svc/my-first-cluster-dashboards 5601:5601

Option 2: LoadBalancer Service (Production)

For production environments, expose services via LoadBalancer:

spec:
  general:
    serviceType: LoadBalancer
  dashboards:
    service:
      type: LoadBalancer

Option 3: Ingress Controller

Configure ingress for domain-based access:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: opensearch-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-passthrough: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  tls:
  - hosts:
    - opensearch.example.com
    secretName: opensearch-tls
  rules:
  - host: opensearch.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-first-cluster
            port:
              number: 9200

API Access and Testing

Basic Health Check

curl -k -u admin:$ADMIN_PASSWORD https://localhost:9200/_cluster/health?pretty

Advanced API Operations

# Check cluster nodes
curl -k -u admin:$ADMIN_PASSWORD https://localhost:9200/_cat/nodes?v

# List indices
curl -k -u admin:$ADMIN_PASSWORD https://localhost:9200/_cat/indices?v

# Create a test index
curl -k -u admin:$ADMIN_PASSWORD -X PUT https://localhost:9200/test-index \
  -H 'Content-Type: application/json' \
  -d '{"settings": {"number_of_shards": 1, "number_of_replicas": 1}}'

TLS Certificate Management

The operator automatically manages TLS certificates for secure communication:

spec:
  security:
    tls:
      http:
        generate: true        # Auto-generate HTTP certificates
        secret:
          name: ""           # Optional: Use existing certificate
      transport:
        generate: true        # Auto-generate transport certificates
        perNode: true        # Generate per-node certificates

Advanced Configuration

Multi-Node Pool Architecture

Configure different node pools for optimal performance and resource utilization:

spec:
  nodePools:
    # Master-eligible nodes
    - component: masters
      replicas: 3
      roles: ["cluster_manager"]
      resources:
        requests:
          memory: "2Gi"
          cpu: "1000m"
        limits:
          memory: "2Gi"
          cpu: "1000m"
      persistence:
        pvc:
          storageClass: "fast-ssd"
          size: "20Gi"

    # Dedicated data nodes
    - component: data-nodes
      replicas: 4
      roles: ["data"]
      resources:
        requests:
          memory: "16Gi"
          cpu: "4000m"
        limits:
          memory: "16Gi"
          cpu: "4000m"
      persistence:
        pvc:
          storageClass: "high-iops"
          size: "500Gi"

    # Coordinating nodes for query handling
    - component: coordinators
      replicas: 2
      roles: ["coordinating_only"]
      resources:
        requests:
          memory: "4Gi"
          cpu: "2000m"
        limits:
          memory: "4Gi"
          cpu: "2000m"

Hot-Warm-Cold Architecture

Implement tiered storage for cost optimization:

spec:
  nodePools:
    # Hot nodes - latest data, high performance
    - component: hot-nodes
      replicas: 3
      roles: ["data", "data_hot"]
      resources:
        requests:
          memory: "32Gi"
          cpu: "8000m"
      persistence:
        pvc:
          storageClass: "nvme-ssd"
          size: "1Ti"
      jvm: "-Xms16g -Xmx16g"

    # Warm nodes - older frequently accessed data
    - component: warm-nodes
      replicas: 4
      roles: ["data", "data_warm"]
      resources:
        requests:
          memory: "16Gi"
          cpu: "4000m"
      persistence:
        pvc:
          storageClass: "premium-ssd"
          size: "2Ti"
      jvm: "-Xms8g -Xmx8g"

    # Cold nodes - archive data, cost-optimized
    - component: cold-nodes
      replicas: 2
      roles: ["data", "data_cold"]
      resources:
        requests:
          memory: "8Gi"
          cpu: "2000m"
      persistence:
        pvc:
          storageClass: "standard"
          size: "5Ti"
      jvm: "-Xms4g -Xmx4g"

Custom OpenSearch Configuration

Configure OpenSearch settings for your use case:

spec:
  general:
    version: "2.14.0"
    pluginsList:
      - "repository-s3"
      - "repository-azure"
      - "ingest-attachment"
      - "analysis-icu"
    additionalConfig:
      opensearch.yml: |
        cluster.routing.allocation.disk.watermark.low: 85%
        cluster.routing.allocation.disk.watermark.high: 90%
        cluster.routing.allocation.disk.watermark.flood_stage: 95%
        indices.recovery.max_bytes_per_sec: 100mb
        indices.memory.index_buffer_size: 20%
        thread_pool.search.queue_size: 10000
        thread_pool.write.queue_size: 10000

Resource Management and Scaling

Horizontal Pod Autoscaling

Enable automatic scaling based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: opensearch-data-hpa
spec:
  scaleTargetRef:
    apiVersion: opensearch.opster.io/v1
    kind: OpenSearchCluster
    name: my-first-cluster
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Vertical Scaling

Update node resources:

# Scale up data node resources
kubectl patch opensearchcluster my-first-cluster --type='merge' -p='
{
  "spec": {
    "nodePools": [{
      "component": "data-nodes",
      "resources": {
        "requests": {
          "memory": "32Gi",
          "cpu": "8000m"
        },
        "limits": {
          "memory": "32Gi",
          "cpu": "8000m"
        }
      }
    }]
  }
}'

Storage Scaling

Increase persistent volume sizes:

spec:
  nodePools:
    - component: data-nodes
      persistence:
        pvc:
          size: "1Ti"  # Increased from 500Gi
          # Note: Requires storage class that supports volume expansion

Version Management and Rolling Updates

Controlled Rolling Updates

Configure update strategy:

spec:
  general:
    version: "2.15.0"
    drainDataNodes: true
  updateStrategy:
    type: "RollingUpdate"
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0

Blue-Green Deployment

For zero-downtime major updates:

# Create new cluster with updated version
kubectl apply -f opensearch-cluster-v2.yaml

# Migrate data using reindex API
curl -X POST "https://old-cluster:9200/_reindex" \
  -H 'Content-Type: application/json' \
  -d '{
    "source": {
      "remote": {
        "host": "https://old-cluster:9200"
      },
      "index": "source-index"
    },
    "dest": {
      "index": "dest-index"
    }
  }'

# Switch traffic to new cluster
# Delete old cluster after verification

Production Considerations

High Availability and Resilience

Multi-Zone Deployment

Distribute nodes across availability zones for fault tolerance:

spec:
  nodePools:
    - component: masters
      replicas: 3
      nodeSelector:
        topology.kubernetes.io/zone: us-west-2a
    - component: masters-zone-b
      replicas: 3
      nodeSelector:
        topology.kubernetes.io/zone: us-west-2b
    - component: masters-zone-c
      replicas: 3
      nodeSelector:
        topology.kubernetes.io/zone: us-west-2c

Pod Disruption Budgets

Ensure cluster availability during maintenance:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: opensearch-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      opensearch.role: master
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: opensearch-data-pdb
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      opensearch.role: data

Resource Quotas and Limits

Set cluster-wide resource constraints:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: opensearch-quota
spec:
  hard:
    requests.cpu: "50"
    requests.memory: 200Gi
    limits.cpu: "100"
    limits.memory: 400Gi
    persistentvolumeclaims: "20"

Security Hardening

Network Policies

Implement network segmentation:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: opensearch-network-policy
spec:
  podSelector:
    matchLabels:
      app: opensearch
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: opensearch
    - podSelector:
        matchLabels:
          app: kibana
    ports:
    - protocol: TCP
      port: 9200
    - protocol: TCP
      port: 9300
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: opensearch
    ports:
    - protocol: TCP
      port: 9300

RBAC Configuration

Configure Role-Based Access Control:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: opensearch-operator
  namespace: opensearch-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: opensearch-operator
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: opensearch-operator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: opensearch-operator
subjects:
- kind: ServiceAccount
  name: opensearch-operator
  namespace: opensearch-system

Monitoring and Observability

Prometheus Integration

Enable metrics collection:

spec:
  monitoring:
    enable: true
    scrapeInterval: 30s
    labels:
      release: prometheus
  general:
    additionalConfig:
      opensearch.yml: |
        prometheus.metrics.enabled: true
        prometheus.indices: true
        prometheus.cluster.settings: true

Grafana Dashboard

Import pre-built dashboards for visualization:

# Download official OpenSearch Grafana dashboard
curl -o opensearch-dashboard.json \
  https://raw.githubusercontent.com/opensearch-project/opensearch-k8s-operator/main/grafana/opensearch-cluster-dashboard.json

# Import to Grafana
kubectl create configmap opensearch-dashboard \
  --from-file=opensearch-dashboard.json \
  -n monitoring

Log Management

Configure centralized logging:

spec:
  general:
    additionalConfig:
      log4j2.properties: |
        appender.console.type = Console
        appender.console.name = console
        appender.console.layout.type = PatternLayout
        appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n

        appender.json.type = Console
        appender.json.name = json
        appender.json.layout.type = ESJsonLayout
        appender.json.layout.type_name = server

        rootLogger.level = info
        rootLogger.appenderRef.console.ref = console
        rootLogger.appenderRef.json.ref = json

Health Checks and Alerts

Configure monitoring alerts:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: opensearch-alerts
spec:
  groups:
  - name: opensearch.rules
    rules:
    - alert: OpenSearchClusterRed
      expr: opensearch_cluster_status{color="red"} == 1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "OpenSearch cluster status is RED"
        description: "Cluster  is in RED state"

    - alert: OpenSearchNodeDown
      expr: up{job="opensearch"} == 0
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "OpenSearch node is down"
        description: "Node  has been down for more than 2 minutes"

    - alert: OpenSearchDiskSpaceHigh
      expr: opensearch_filesystem_data_used_percent > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "OpenSearch disk space usage high"
        description: "Disk usage on  is above 85%"

Best Practices

Performance Optimization

JVM Tuning

Optimize Java Virtual Machine settings:

spec:
  nodePools:
    - component: data-nodes
      jvm: |
        -Xms16g
        -Xmx16g
        -XX:+UseG1GC
        -XX:G1HeapRegionSize=32m
        -XX:+UseG1GC
        -XX:MaxGCPauseMillis=200
        -XX:+UnlockExperimentalVMOptions
        -XX:+UseTransparentHugePages
        -XX:+AlwaysPreTouch

Index Templates and Policies

Configure index lifecycle management:

# Create index template
curl -X PUT "https://localhost:9200/_index_template/logs_template" \
  -H 'Content-Type: application/json' \
  -d '{
    "index_patterns": ["logs-*"],
    "template": {
      "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 1,
        "index.refresh_interval": "30s",
        "index.codec": "best_compression"
      },
      "mappings": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "message": {
            "type": "text",
            "analyzer": "standard"
          }
        }
      }
    }
  }'

# Create ISM policy for log rotation
curl -X PUT "https://localhost:9200/_plugins/_ism/policies/logs_policy" \
  -H 'Content-Type: application/json' \
  -d '{
    "policy": {
      "description": "Log rotation policy",
      "default_state": "hot",
      "states": [
        {
          "name": "hot",
          "actions": [],
          "transitions": [
            {
              "state_name": "warm",
              "conditions": {
                "min_index_age": "7d"
              }
            }
          ]
        },
        {
          "name": "warm",
          "actions": [
            {
              "replica_count": {
                "number_of_replicas": 0
              }
            }
          ],
          "transitions": [
            {
              "state_name": "delete",
              "conditions": {
                "min_index_age": "30d"
              }
            }
          ]
        },
        {
          "name": "delete",
          "actions": [
            {
              "delete": {}
            }
          ]
        }
      ]
    }
  }'

Backup and Disaster Recovery

Snapshot Configuration

Configure automated backups:

spec:
  general:
    additionalConfig:
      opensearch.yml: |
        path.repo: ["/usr/share/opensearch/snapshots"]
        repositories.s3.bucket: "opensearch-backups"
        repositories.s3.region: "us-west-2"

Backup Script

#!/bin/bash
# Automated backup script

CLUSTER_URL="https://localhost:9200"
ADMIN_USER="admin"
ADMIN_PASS="$OPENSEARCH_ADMIN_PASSWORD"
REPO_NAME="s3_repository"
SNAPSHOT_NAME="snapshot_$(date +%Y%m%d_%H%M%S)"

# Create repository if not exists
curl -k -u ${ADMIN_USER}:${ADMIN_PASS} -X PUT "${CLUSTER_URL}/_snapshot/${REPO_NAME}" \
  -H 'Content-Type: application/json' \
  -d '{
    "type": "s3",
    "settings": {
      "bucket": "opensearch-backups",
      "region": "us-west-2",
      "base_path": "snapshots"
    }
  }'

# Create snapshot
curl -k -u ${ADMIN_USER}:${ADMIN_PASS} -X PUT "${CLUSTER_URL}/_snapshot/${REPO_NAME}/${SNAPSHOT_NAME}" \
  -H 'Content-Type: application/json' \
  -d '{
    "indices": "*",
    "ignore_unavailable": true,
    "include_global_state": false
  }'

echo "Snapshot ${SNAPSHOT_NAME} created successfully"

FAQ

General Questions

Q: What's the difference between OpenSearch and Elasticsearch? A: OpenSearch is an open-source fork of Elasticsearch, created after Elastic changed its license. It maintains API compatibility while being fully open-source under Apache 2.0 license.

Q: Can I migrate from Elasticsearch to OpenSearch? A: Yes, OpenSearch maintains API compatibility with Elasticsearch versions up to 7.10. Migration typically involves updating client configurations and reindexing data.

Q: What Kubernetes versions are supported? A: The operator supports Kubernetes 1.19 and later versions. It's tested on major cloud platforms including AWS EKS, Google GKE, and Azure AKS.

Operational Questions

Q: How do I handle node failures? A: The operator automatically detects and replaces failed nodes. Ensure you have proper replica settings and the cluster will redistribute data automatically.

Q: Can I run multiple OpenSearch clusters in the same namespace? A: Yes, you can run multiple clusters in the same namespace by giving them unique names. Each cluster is isolated and managed independently.

Q: How do I update the operator itself? A: Update the operator using Helm: helm upgrade opensearch-operator opensearch-operator/opensearch-operator

Performance Questions

Q: What are the recommended resource requirements? A: For production:

Master nodes: 2-4 CPU cores, 4-8GB RAM
Data nodes: 4-8 CPU cores, 16-64GB RAM
Storage: High-IOPS SSDs recommended

Q: How do I optimize for search performance? A: Use dedicated coordinating nodes, optimize index mappings, implement proper sharding strategy, and consider hot-warm-cold architecture for time-series data.

Q: What's the maximum cluster size supported? A: OpenSearch clusters can scale to hundreds of nodes. The operator has been tested with clusters up to 100 nodes, but larger deployments are possible with proper planning.

Troubleshooting

Common Issues and Solutions

Pod Startup Issues

Problem: Pods stuck in Pending state

# Check node resources
kubectl describe nodes
kubectl top nodes

# Check pod events
kubectl describe pod <pod-name>

# Check resource quotas
kubectl describe resourcequota

Solution: Ensure sufficient node resources or adjust resource requests.

Cluster Formation Issues

Problem: Nodes not joining cluster

# Check cluster logs
kubectl logs -l opensearch.role=master

# Verify network connectivity
kubectl exec -it <pod-name> -- curl -k https://<other-pod-ip>:9200

Solution: Verify network policies and service discovery configuration.

Storage Issues

Problem: Persistent volume claim failures

# Check PVC status
kubectl get pvc
kubectl describe pvc <pvc-name>

# Check storage class
kubectl get storageclass

Solution: Ensure storage class supports dynamic provisioning and has sufficient capacity.

Debug Commands

# Get all OpenSearch resources
kubectl get opensearchclusters,pods,services,pvc -l app=opensearch

# Check operator logs
kubectl logs -l app.kubernetes.io/name=opensearch-operator -f

# Exec into OpenSearch pod
kubectl exec -it <pod-name> -- /bin/bash

# Port forward for direct access
kubectl port-forward svc/<service-name> 9200:9200

# Get cluster configuration
curl -k -u admin:$PASSWORD https://localhost:9200/_cluster/settings?pretty

# Check allocation explain
curl -k -u admin:$PASSWORD https://localhost:9200/_cluster/allocation/explain?pretty

Cleanup and Uninstallation

Remove OpenSearch Cluster

# Delete cluster (data will be preserved in PVCs)
kubectl delete opensearchcluster my-first-cluster

# Delete PVCs to remove data permanently
kubectl delete pvc -l opensearch.cluster=my-first-cluster

Uninstall Operator

# Remove operator
helm uninstall opensearch-operator

# Clean up CRDs (optional)
kubectl delete crd opensearchclusters.opensearch.opster.io

# Remove operator namespace
kubectl delete namespace opensearch-operator-system