ELK Stack Tutorial: Guide to Elasticsearch, Logstash, and Kibana

What is the ELK Stack?

The ELK Stack is a powerful combination of three open-source tools designed for log management, monitoring, and data analysis:

  • Elasticsearch: A distributed search and analytics engine that stores and indexes data
  • Logstash: A data processing pipeline that ingests, transforms, and sends data to Elasticsearch
  • Kibana: A web-based visualization and management interface for Elasticsearch

Together, these tools provide a complete solution for collecting, processing, storing, analyzing, and visualizing log data and other time-series data.

Prerequisites

Before setting up the ELK stack, ensure you have:

  1. Java 11 or later installed on your system
  2. At least 4GB RAM available for development (8GB+ recommended for production)
  3. Basic knowledge of command line operations
  4. Network access for downloading components

Step 1: Installing Elasticsearch

1.1 Download and Install Elasticsearch

# Download Elasticsearch (replace X.X.X with the latest version)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-X.X.X-linux-x86_64.tar.gz

# Extract the archive
tar -xzf elasticsearch-X.X.X-linux-x86_64.tar.gz

# Move to a convenient location
sudo mv elasticsearch-X.X.X /opt/elasticsearch

1.2 Configure Elasticsearch

Edit the configuration file:

sudo nano /opt/elasticsearch/config/elasticsearch.yml

Add these basic configurations:

# Cluster and node settings
cluster.name: my-elk-cluster
node.name: node-1

# Network settings
network.host: 0.0.0.0
http.port: 9200

# Discovery settings
discovery.type: single-node

# Security settings (for development)
xpack.security.enabled: false

1.3 Start Elasticsearch

# Start Elasticsearch
cd /opt/elasticsearch
./bin/elasticsearch

# Or run in background
./bin/elasticsearch -d

1.4 Verify Installation

Test if Elasticsearch is running:

curl -X GET "localhost:9200/?pretty"

You should see a response like:

{
  "name" : "node-1",
  "cluster_name" : "my-elk-cluster",
  "cluster_uuid" : "...",
  "version" : {
    "number" : "8.x.x",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "...",
    "build_date" : "...",
    "build_snapshot" : false,
    "lucene_version" : "...",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Step 2: Installing Logstash

2.1 Download and Install Logstash

# Download Logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-X.X.X-linux-x86_64.tar.gz

# Extract the archive
tar -xzf logstash-X.X.X-linux-x86_64.tar.gz

# Move to a convenient location
sudo mv logstash-X.X.X /opt/logstash

2.2 Create a Basic Logstash Configuration

Create a configuration file:

sudo nano /opt/logstash/config/logstash.conf

Add this basic configuration:

input {
  file {
    path => "/var/log/*.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  if [path] =~ "access" {
    mutate { replace => { "type" => "apache" } }
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
  } else if [path] =~ "error" {
    mutate { replace => { "type" => "apache-error" } }
    grok {
      match => { "message" => "%{COMMONAPACHELOG}" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
  }
  stdout { codec => rubydebug }
}

2.3 Start Logstash

# Start Logstash with the configuration
cd /opt/logstash
./bin/logstash -f config/logstash.conf

# Or run in background
./bin/logstash -f config/logstash.conf --config.reload.automatic

Step 3: Installing Kibana

3.1 Download and Install Kibana

# Download Kibana
wget https://artifacts.elastic.co/downloads/kibana/kibana-X.X.X-linux-x86_64.tar.gz

# Extract the archive
tar -xzf kibana-X.X.X-linux-x86_64.tar.gz

# Move to a convenient location
sudo mv kibana-X.X.X /opt/kibana

3.2 Configure Kibana

Edit the configuration file:

sudo nano /opt/kibana/config/kibana.yml

Add these configurations:

# Server settings
server.port: 5601
server.host: "0.0.0.0"

# Elasticsearch connection
elasticsearch.hosts: ["http://localhost:9200"]

# Security settings (for development)
elasticsearch.username: "kibana_system"
elasticsearch.password: "changeme"
xpack.security.enabled: false

3.3 Start Kibana

# Start Kibana
cd /opt/kibana
./bin/kibana

# Or run in background
./bin/kibana &

3.4 Access Kibana

Open your web browser and navigate to http://localhost:5601

Step 4: Creating Your First Dashboard

4.1 Create an Index Pattern

  1. Go to Stack ManagementIndex Patterns
  2. Click Create index pattern
  3. Enter logstash-* as the pattern
  4. Select @timestamp as the time field
  5. Click Create index pattern

4.2 Create a Simple Visualization

  1. Go to Visualize Library
  2. Click Create visualization
  3. Select Line chart
  4. Choose your index pattern
  5. Configure the visualization:
    • Y-axis: Aggregation: Count
    • X-axis: Aggregation: Date Histogram, Field: @timestamp
  6. Click Save and name your visualization

4.3 Create a Dashboard

  1. Go to Dashboard
  2. Click Create dashboard
  3. Click Add and select your visualization
  4. Arrange and resize as needed
  5. Click Save and name your dashboard

Step 5: Advanced Logstash Configuration

5.1 Multiple Input Sources

input {
  # File input
  file {
    path => "/var/log/application.log"
    type => "application"
  }
  
  # Beats input (for Filebeat)
  beats {
    port => 5044
    type => "beats"
  }
  
  # TCP input
  tcp {
    port => 5000
    type => "tcp"
  }
}

5.2 Data Transformation with Filters

filter {
  # Parse JSON logs
  if [type] == "json" {
    json {
      source => "message"
    }
  }
  
  # Parse CSV data
  if [type] == "csv" {
    csv {
      columns => ["timestamp", "level", "message", "user"]
      separator => ","
    }
  }
  
  # Add custom fields
  mutate {
    add_field => { "environment" => "production" }
    add_field => { "hostname" => "%{host}" }
  }
  
  # Remove sensitive data
  mutate {
    remove_field => ["password", "credit_card"]
  }
}

5.3 Multiple Output Destinations

output {
  # Primary output to Elasticsearch
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logs-%{+YYYY.MM.dd}"
    template_name => "logs"
    template_overwrite => true
  }
  
  # Backup to file
  file {
    path => "/var/log/logstash/backup.log"
    codec => json
  }
  
  # Send alerts to email (requires email output plugin)
  if [level] == "ERROR" {
    email {
      to => "admin@example.com"
      subject => "Error Alert: %{message}"
      body => "Error occurred: %{message}"
    }
  }
}

Step 6: Monitoring and Maintenance

6.1 Monitor Cluster Health

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check node stats
curl -X GET "localhost:9200/_nodes/stats?pretty"

# Check index stats
curl -X GET "localhost:9200/_stats?pretty"

6.2 Index Management

# List all indices
curl -X GET "localhost:9200/_cat/indices?v"

# Delete old indices
curl -X DELETE "localhost:9200/logstash-2023.01.*"

# Optimize indices
curl -X POST "localhost:9200/logstash-*/_forcemerge"

6.3 Backup and Restore

# Create snapshot repository
curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/path/to/backup/directory"
  }
}'

# Create snapshot
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"

# Restore snapshot
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"

Best Practices

  1. Resource Planning: Allocate sufficient RAM and CPU for each component
  2. Security: Enable security features in production environments
  3. Monitoring: Set up monitoring for the ELK stack itself
  4. Backup: Implement regular backup strategies for your data
  5. Index Management: Use index lifecycle management (ILM) for automatic index management
  6. Performance Tuning: Optimize JVM heap sizes and other performance parameters
  7. Log Rotation: Implement proper log rotation to prevent disk space issues

Common Issues and Solutions

Elasticsearch Won't Start

  • Check Java version compatibility
  • Verify available memory
  • Check port availability
  • Review error logs in /opt/elasticsearch/logs/

Logstash Configuration Errors

  • Validate configuration syntax: ./bin/logstash -f config/logstash.conf --config.test_and_exit
  • Check input/output plugin compatibility
  • Verify file permissions for log files

Kibana Connection Issues

  • Ensure Elasticsearch is running and accessible
  • Check network connectivity
  • Verify configuration settings
  • Clear browser cache

Additional Resources

Frequently Asked Questions

Q: What is the difference between ELK and EFK stack?
A: The EFK stack replaces Logstash with Fluentd as the log collector. Fluentd is often preferred for its lower resource usage and better performance in containerized environments.

Q: How much storage do I need for the ELK stack?
A: Storage requirements depend on your log volume and retention period. As a general rule, plan for 2-3 times your daily log volume for indexing overhead and replicas.

Q: Can I use the ELK stack for real-time monitoring?
A: Yes, the ELK stack can provide near real-time monitoring. Logstash can process logs in real-time, and Kibana dashboards can refresh automatically to show current data.

Q: How do I scale the ELK stack for high volume?
A: Scale horizontally by adding more Elasticsearch nodes, use Logstash workers for parallel processing, and consider using message queues like Redis or Kafka for buffering.

Q: Is the ELK stack suitable for small deployments?
A: Yes, the ELK stack can be deployed on a single server for small environments. However, consider resource requirements and plan for future growth.

Q: How do I secure the ELK stack?
A: Enable X-Pack security features, use SSL/TLS encryption, implement proper authentication and authorization, and restrict network access to the components.

Q: What alternatives exist to the ELK stack?
A: Popular alternatives include Graylog, Splunk, Fluentd + Elasticsearch + Kibana (EFK), and cloud-based solutions like AWS CloudWatch, Google Cloud Logging, and Azure Monitor.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.