ELK Stack Tutorial: Guide to Elasticsearch, Logstash, and Kibana

What is the ELK Stack?

The ELK Stack is a powerful combination of three open-source tools designed for log management, monitoring, and data analysis:

Elasticsearch: A distributed search and analytics engine that stores and indexes data
Logstash: A data processing pipeline that ingests, transforms, and sends data to Elasticsearch
Kibana: A web-based visualization and management interface for Elasticsearch

Together, these tools provide a complete solution for collecting, processing, storing, analyzing, and visualizing log data and other time-series data.

Prerequisites

Before setting up the ELK stack, ensure you have:

Java 11 or later installed on your system
At least 4GB RAM available for development (8GB+ recommended for production)
Basic knowledge of command line operations
Network access for downloading components

Step 1: Installing Elasticsearch

1.1 Download and Install Elasticsearch

# Download Elasticsearch (replace X.X.X with the latest version)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-X.X.X-linux-x86_64.tar.gz

# Extract the archive
tar -xzf elasticsearch-X.X.X-linux-x86_64.tar.gz

# Move to a convenient location
sudo mv elasticsearch-X.X.X /opt/elasticsearch

1.2 Configure Elasticsearch

Edit the configuration file:

sudo nano /opt/elasticsearch/config/elasticsearch.yml

Add these basic configurations:

# Cluster and node settings
cluster.name: my-elk-cluster
node.name: node-1

# Network settings
network.host: 0.0.0.0
http.port: 9200

# Discovery settings
discovery.type: single-node

# Security settings (for development)
xpack.security.enabled: false

1.3 Start Elasticsearch

# Start Elasticsearch
cd /opt/elasticsearch
./bin/elasticsearch

# Or run in background
./bin/elasticsearch -d

1.4 Verify Installation

Test if Elasticsearch is running:

curl -X GET "localhost:9200/?pretty"

You should see a response like:

{
  "name" : "node-1",
  "cluster_name" : "my-elk-cluster",
  "cluster_uuid" : "...",
  "version" : {
    "number" : "8.x.x",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "...",
    "build_date" : "...",
    "build_snapshot" : false,
    "lucene_version" : "...",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Step 2: Installing Logstash

2.1 Download and Install Logstash

# Download Logstash
wget https://artifacts.elastic.co/downloads/logstash/logstash-X.X.X-linux-x86_64.tar.gz

# Extract the archive
tar -xzf logstash-X.X.X-linux-x86_64.tar.gz

# Move to a convenient location
sudo mv logstash-X.X.X /opt/logstash

2.2 Create a Basic Logstash Configuration

Create a configuration file:

sudo nano /opt/logstash/config/logstash.conf

Add this basic configuration:

input {
  file {
    path => "/var/log/*.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  if [path] =~ "access" {
    mutate { replace => { "type" => "apache" } }
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    date {
      match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    }
  } else if [path] =~ "error" {
    mutate { replace => { "type" => "apache-error" } }
    grok {
      match => { "message" => "%{COMMONAPACHELOG}" }
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logstash-%{+YYYY.MM.dd}"
  }
  stdout { codec => rubydebug }
}

2.3 Start Logstash

# Start Logstash with the configuration
cd /opt/logstash
./bin/logstash -f config/logstash.conf

# Or run in background
./bin/logstash -f config/logstash.conf --config.reload.automatic

Step 3: Installing Kibana

3.1 Download and Install Kibana

# Download Kibana
wget https://artifacts.elastic.co/downloads/kibana/kibana-X.X.X-linux-x86_64.tar.gz

# Extract the archive
tar -xzf kibana-X.X.X-linux-x86_64.tar.gz

# Move to a convenient location
sudo mv kibana-X.X.X /opt/kibana

3.2 Configure Kibana

Edit the configuration file:

sudo nano /opt/kibana/config/kibana.yml

Add these configurations:

# Server settings
server.port: 5601
server.host: "0.0.0.0"

# Elasticsearch connection
elasticsearch.hosts: ["http://localhost:9200"]

# Security settings (for development)
elasticsearch.username: "kibana_system"
elasticsearch.password: "changeme"
xpack.security.enabled: false

3.3 Start Kibana

# Start Kibana
cd /opt/kibana
./bin/kibana

# Or run in background
./bin/kibana &

3.4 Access Kibana

Open your web browser and navigate to http://localhost:5601

Step 4: Creating Your First Dashboard

4.1 Create an Index Pattern

Go to Stack Management → Index Patterns
Click Create index pattern
Enter logstash-* as the pattern
Select @timestamp as the time field
Click Create index pattern

4.2 Create a Simple Visualization

Go to Visualize Library
Click Create visualization
Select Line chart
Choose your index pattern
Configure the visualization:
- Y-axis: Aggregation: Count
- X-axis: Aggregation: Date Histogram, Field: @timestamp
Click Save and name your visualization

4.3 Create a Dashboard

Go to Dashboard
Click Create dashboard
Click Add and select your visualization
Arrange and resize as needed
Click Save and name your dashboard

Step 5: Advanced Logstash Configuration

5.1 Multiple Input Sources

input {
  # File input
  file {
    path => "/var/log/application.log"
    type => "application"
  }
  
  # Beats input (for Filebeat)
  beats {
    port => 5044
    type => "beats"
  }
  
  # TCP input
  tcp {
    port => 5000
    type => "tcp"
  }
}

5.2 Data Transformation with Filters

filter {
  # Parse JSON logs
  if [type] == "json" {
    json {
      source => "message"
    }
  }
  
  # Parse CSV data
  if [type] == "csv" {
    csv {
      columns => ["timestamp", "level", "message", "user"]
      separator => ","
    }
  }
  
  # Add custom fields
  mutate {
    add_field => { "environment" => "production" }
    add_field => { "hostname" => "%{host}" }
  }
  
  # Remove sensitive data
  mutate {
    remove_field => ["password", "credit_card"]
  }
}

5.3 Multiple Output Destinations

output {
  # Primary output to Elasticsearch
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logs-%{+YYYY.MM.dd}"
    template_name => "logs"
    template_overwrite => true
  }
  
  # Backup to file
  file {
    path => "/var/log/logstash/backup.log"
    codec => json
  }
  
  # Send alerts to email (requires email output plugin)
  if [level] == "ERROR" {
    email {
      to => "admin@example.com"
      subject => "Error Alert: %{message}"
      body => "Error occurred: %{message}"
    }
  }
}

Step 6: Monitoring and Maintenance

6.1 Monitor Cluster Health

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check node stats
curl -X GET "localhost:9200/_nodes/stats?pretty"

# Check index stats
curl -X GET "localhost:9200/_stats?pretty"

6.2 Index Management

# List all indices
curl -X GET "localhost:9200/_cat/indices?v"

# Delete old indices
curl -X DELETE "localhost:9200/logstash-2023.01.*"

# Optimize indices
curl -X POST "localhost:9200/logstash-*/_forcemerge"

6.3 Backup and Restore

# Create snapshot repository
curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/path/to/backup/directory"
  }
}'

# Create snapshot
curl -X PUT "localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true"

# Restore snapshot
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore"

Best Practices

Resource Planning: Allocate sufficient RAM and CPU for each component
Security: Enable security features in production environments
Monitoring: Set up monitoring for the ELK stack itself
Backup: Implement regular backup strategies for your data
Index Management: Use index lifecycle management (ILM) for automatic index management
Performance Tuning: Optimize JVM heap sizes and other performance parameters
Log Rotation: Implement proper log rotation to prevent disk space issues

Common Issues and Solutions

Elasticsearch Won't Start

Check Java version compatibility
Verify available memory
Check port availability
Review error logs in /opt/elasticsearch/logs/

Logstash Configuration Errors

Validate configuration syntax: ./bin/logstash -f config/logstash.conf --config.test_and_exit
Check input/output plugin compatibility
Verify file permissions for log files

Kibana Connection Issues

Ensure Elasticsearch is running and accessible
Check network connectivity
Verify configuration settings
Clear browser cache

Additional Resources

Frequently Asked Questions

Q: What is the difference between ELK and EFK stack?
A: The EFK stack replaces Logstash with Fluentd as the log collector. Fluentd is often preferred for its lower resource usage and better performance in containerized environments.

Q: How much storage do I need for the ELK stack?
A: Storage requirements depend on your log volume and retention period. As a general rule, plan for 2-3 times your daily log volume for indexing overhead and replicas.

Q: Can I use the ELK stack for real-time monitoring?
A: Yes, the ELK stack can provide near real-time monitoring. Logstash can process logs in real-time, and Kibana dashboards can refresh automatically to show current data.

Q: How do I scale the ELK stack for high volume?
A: Scale horizontally by adding more Elasticsearch nodes, use Logstash workers for parallel processing, and consider using message queues like Redis or Kafka for buffering.

Q: Is the ELK stack suitable for small deployments?
A: Yes, the ELK stack can be deployed on a single server for small environments. However, consider resource requirements and plan for future growth.

Q: How do I secure the ELK stack?
A: Enable X-Pack security features, use SSL/TLS encryption, implement proper authentication and authorization, and restrict network access to the components.

Q: What alternatives exist to the ELK stack?
A: Popular alternatives include Graylog, Splunk, Fluentd + Elasticsearch + Kibana (EFK), and cloud-based solutions like AWS CloudWatch, Google Cloud Logging, and Azure Monitor.