NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Guide to Elastic APM: Setup, Agents, and Production Operations

Elastic APM (Application Performance Monitoring) collects distributed traces, performance metrics, and errors from instrumented applications and stores them in Elasticsearch for visualization in Kibana. Since 7.13, the recommended ingest path is the Elastic Agent with the APM integration, replacing the standalone APM Server binary for most new deployments. Standalone APM agents remain officially supported for every major language, and OpenTelemetry data can be sent directly to APM Server via the OTLP endpoint. This guide covers the architecture, agent options, setup steps, and production operational considerations.

Architecture

Elastic APM has four components:

Component Role
APM agents (or OTel SDKs) Instrument application code, collect traces/metrics/errors
APM Server (standalone or via Elastic Agent) Receives agent data, transforms it, writes to Elasticsearch
Elasticsearch Stores and indexes APM data
Kibana APM UI Visualizes traces, errors, services, dependencies

On Elastic Cloud, APM Server is auto-deployed when APM is enabled on the deployment. Self-managed installations choose between the standalone APM Server binary and the Elastic Agent with the APM integration - the latter is the supported path going forward.

Capabilities

Capability What it provides
Distributed tracing End-to-end request traces across services with span timings
Performance metrics Response times, throughput, error rates, CPU/memory
Error tracking Automatic exception capture, stack traces, grouping
Service maps Visual topology of services and dependencies
Real User Monitoring (RUM) Browser-side timing via the JavaScript agent
Profiling CPU profiling via Universal Profiling (Enterprise)
Anomaly detection ML-driven anomaly detection on latency, throughput, errors (Platinum+)

Supported Agents

Elastic maintains official agents for these languages. Status as of 2026:

Agent Latest Notes
Java 1.x Auto-instruments Servlet, Spring, JAX-RS, Vert.x, gRPC
Node.js 4.x Express, Koa, Fastify, Next.js, NestJS
Python 6.x Django, Flask, FastAPI, Tornado, Starlette
.NET 1.x ASP.NET, ASP.NET Core, EF, Dapper
Go 2.x net/http, gin, echo, chi, gRPC
Ruby 4.x Rails, Sinatra, Grape
PHP 1.x Laravel, Symfony, WordPress, Drupal
Android 1.x OkHttp, GA traces
iOS / Swift 1.x NSURLSession
JavaScript RUM 5.x Browser-side timing

The supported list shifts - check the latest at elastic.co/guide/en/apm/agent/. Several agents (notably Java and Node.js) auto-instrument popular libraries without code changes.

Beyond Elastic's own agents, Elastic APM accepts OpenTelemetry traces, metrics, and logs via the OTLP endpoint exposed by APM Server. For new instrumentation, OpenTelemetry is often the better long-term choice - it avoids vendor lock-in while still working with Elastic APM.

Setup: Elastic Cloud

The fast path:

  1. Create or open an Elastic Cloud deployment
  2. Enable APM and Fleet from the deployment edit page
  3. Note the APM Server URL and the secret token (Integrations > APM Integration)
  4. Instrument the application with the appropriate agent

No APM Server installation required.

Setup: Self-Managed (Elastic Agent + APM Integration)

The recommended path for self-managed deployments:

  1. Install Elastic Agent and enroll it in Fleet:

    sudo ./elastic-agent install --url=https://fleet-server:8220 --enrollment-token=<token>
    
  2. In Kibana > Fleet, add the APM integration to a policy.

  3. Configure the APM endpoint and secret token in the integration settings.

  4. Apply the policy to the agent(s).

The Elastic Agent runs APM Server, ships data to Elasticsearch, and is updatable from Fleet without touching the host.

Setup: Standalone APM Server

Still supported for environments where Elastic Agent isn't a fit:

VERSION=8.15.3
curl -L -O "https://artifacts.elastic.co/downloads/apm-server/apm-server-${VERSION}-linux-x86_64.tar.gz"
tar xzf "apm-server-${VERSION}-linux-x86_64.tar.gz"
cd "apm-server-${VERSION}-linux-x86_64/"

Edit apm-server.yml:

apm-server:
  host: "0.0.0.0:8200"
  secret_token: "your-secret-token"

output.elasticsearch:
  hosts: ["https://elasticsearch:9200"]
  username: "apm_writer"
  password: "${APM_WRITER_PWD}"
  ssl:
    verification_mode: "certificate"

setup.kibana:
  host: "https://kibana:5601"

Start with ./apm-server -e. Production deployments should run as a systemd unit.

Agent Examples

Java

java -javaagent:/path/to/elastic-apm-agent-1.x.jar \
     -Delastic.apm.service_name=my-service \
     -Delastic.apm.server_url=https://apm-server:8200 \
     -Delastic.apm.secret_token=$APM_TOKEN \
     -Delastic.apm.environment=production \
     -jar my-application.jar

Node.js

// Must be the very first require in the application
const apm = require('elastic-apm-node').start({
  serviceName: 'my-service',
  serverUrl: 'https://apm-server:8200',
  secretToken: process.env.APM_TOKEN,
  environment: 'production',
  transactionSampleRate: 0.5
})

Python (Django)

INSTALLED_APPS += ('elasticapm.contrib.django',)

MIDDLEWARE = ('elasticapm.contrib.django.middleware.TracingMiddleware',) + MIDDLEWARE

ELASTIC_APM = {
    'SERVICE_NAME': 'my-service',
    'SERVER_URL': 'https://apm-server:8200',
    'SECRET_TOKEN': os.environ['APM_TOKEN'],
    'ENVIRONMENT': 'production',
    'TRANSACTION_SAMPLE_RATE': 0.5,
}

Go

import (
    "go.elastic.co/apm/module/apmhttp/v2"
)

// Wrap the HTTP handler
handler := apmhttp.Wrap(http.DefaultServeMux)
http.ListenAndServe(":8080", handler)

Environment variables (ELASTIC_APM_SERVER_URL, ELASTIC_APM_SECRET_TOKEN, ELASTIC_APM_SERVICE_NAME) work universally and are the recommended way to inject config in containerized environments.

Sampling and Performance Overhead

Default agent overhead is typically 1-3% at default sample rates. To reduce it:

Knob Effect
transaction_sample_rate Lower (e.g. 0.1) to sample fewer traces; keep all errors
capture_body Set to off to skip request/response body capture
stack_trace_limit Cap stack trace depth (default 50)
span_min_duration Drop spans shorter than threshold
capture_headers Disable header capture for sensitive endpoints

Server-side, APM Server scales horizontally - put multiple instances behind a load balancer. Elastic Agent scales similarly when running APM integration on multiple hosts.

Production Considerations

Concern Recommendation
Data volume APM data fills indices fast - apply ILM with hot/warm/cold tiers
Index sizing Trace indices are heavy on field count; review mapping limits
Retention 30 days is common; cap by ILM policy
Secret tokens Rotate periodically; use API keys with limited indices privileges
TLS Always enable TLS on APM Server, especially for agents on untrusted networks
Cluster sizing APM ingest is write-heavy; size the Elasticsearch indexing tier accordingly

Common Pitfalls

  1. Initializing the Node.js or Python agent after the framework is loaded. Auto-instrumentation hooks need to register before the instrumented library is required.
  2. Leaving transaction_sample_rate at 1.0 on a high-throughput service. APM costs scale linearly with sampled traces.
  3. Indexing APM data into the same cluster that hosts the application's own logs without segregating tiers. APM volume crowds out everything else.
  4. Forgetting to apply ILM. APM indices grow without bound until disk fills.
  5. Trusting agent auto-instrumentation to cover custom frameworks. Manually instrument anything the agent doesn't recognize.

Keeping the APM Backend Healthy

Elastic APM only works as well as the Elasticsearch cluster behind it. Ingest spikes, mapping explosions, hot shards, and merge backpressure all degrade APM reliability. Pulse monitors the Elasticsearch cluster that powers APM and surfaces issues - indexing rejections on APM indices, shard imbalance, GC pressure on the coordinating tier - before they show up as missing traces in Kibana. For teams running APM at production scale, the backend Elasticsearch is the part that needs the most ongoing operational attention.

Frequently Asked Questions

Q: What is Elastic APM and how does it work?
A: Elastic APM collects distributed traces, performance metrics, and errors from instrumented applications via language-specific APM agents (or OpenTelemetry SDKs), sends them to APM Server, and stores them in Elasticsearch. Kibana visualizes services, traces, and errors. The recommended deployment path is Elastic Agent with the APM integration.

Q: What's the difference between Elastic Agent APM and standalone APM Server?
A: Elastic Agent with the APM integration is the recommended ingest path going forward - managed via Fleet, updatable without touching the host. Standalone APM Server is still supported and runs as a binary or systemd service. Both produce the same APM data in Elasticsearch.

Q: Can I use OpenTelemetry with Elastic APM?
A: Yes. APM Server exposes an OTLP endpoint that accepts OpenTelemetry traces, metrics, and logs. For new instrumentation, OpenTelemetry is often the better choice - vendor-neutral and well-maintained.

Q: How much overhead do Elastic APM agents add?
A: Typically 1-3% at default sample rates. Lower transaction_sample_rate, disable body capture, and cap stack trace depth if overhead matters. The biggest factor is sampled trace volume, not the agent's instrumentation itself.

Q: Which languages does Elastic APM support?
A: Official Elastic-maintained agents exist for Java, Node.js, Python, .NET, Go, Ruby, PHP, Android, iOS/Swift, and JavaScript RUM (browser). OpenTelemetry SDKs cover additional languages and frameworks via the OTLP endpoint.

Q: How long is Elastic APM data retained?
A: Retention is controlled by Index Lifecycle Management policies on the APM data streams. Default is around 30 days for traces and metrics but is fully configurable. Errors typically warrant longer retention than traces.

Q: Can I run Elastic APM in production at scale?
A: Yes. APM Server scales horizontally; the Elasticsearch backend is the typical scaling concern. Apply ILM, use hot/warm/cold tiers, and size the indexing tier for sustained APM volume. Many production teams ingest billions of spans per day.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.