Neon Postgres: Serverless PostgreSQL Explained

Neon is a serverless Postgres platform built around a fundamental architectural departure: storage and compute run as fully independent layers. Instead of a single Postgres process owning both its WAL and its data files on local disk, Neon separates them so compute nodes can start, stop, and scale without any stateful disk attached. The result is a Postgres-compatible database that can scale to zero, branch like a Git repository, and resize compute mid-flight - but these properties come with real trade-offs worth understanding before you commit to it for production workloads.

Note: Neon was acquired by Databricks in May 2025 for approximately $1 billion.

How the Architecture Works

A Neon deployment consists of three distinct subsystems. The compute layer runs a modified Postgres process — it lacks true superuser access (uses neon_superuser instead), does not support tablespaces, and unlogged tables are truncated on compute restart or scale-to-zero. The query engine, planner, and transaction model otherwise behave as expected - it parses SQL, enforces MVCC, and manages locks in the standard way. What changes is where pages come from and where WAL goes.

WAL is streamed from the compute node to a cluster of Safekeepers - Neon's redundant WAL service, written in Rust. Safekeepers replicate WAL across nodes using a Paxos-based consensus protocol. A transaction is committed once a quorum of Safekeepers acknowledges the record, meaning durability depends on network round trips rather than a local fsync. This is an important latency characteristic: commit latency is bounded by the RTT to the Safekeeper quorum, not by disk speed.

The Pageserver sits downstream of the Safekeepers. It consumes WAL continuously, slices it per relation and per page, and materializes page versions on demand. When a compute node needs a page that isn't in its local cache, it requests it from the Pageserver over the network. The Pageserver reconstructs the correct page version for the requested LSN from a combination of base image snapshots and delta WAL records, then packages these into layer files backed by cloud object storage (S3). The compute node itself has a Local File Cache (LFC) - a resizable NVMe-backed layer that sits below Postgres shared_buffers and caches recently accessed pages locally. shared_buffers is the hot in-RAM cache on top of the NVMe-backed LFC. Cache hit ratio against the LFC is the primary driver of read latency; a working set that fits in the LFC behaves close to local disk performance, while a cache miss requires a round trip to the Pageserver.

Database Branching

Branching is Neon's most distinctive feature for development workflows. A branch is a copy-on-write clone of the database at a specific LSN. Creating one takes roughly one second regardless of database size, because no data is actually copied - the branch simply references the same underlying layer files as the parent, and new writes diverge from that point forward. You're billed only for the data that's unique to a branch, not for shared pages.

This makes branching practical for a workflow that's otherwise painful with traditional managed databases: giving every pull request its own isolated Postgres instance with production-like data. With Neon's Vercel integration, opening a PR automatically creates a branch named like preview-pr-142, injects DATABASE_URL into the preview environment, and tears it down when the PR closes. Running migrations against a branch instead of a shared staging database means schema changes are isolated, reversible by deleting the branch, and testable against realistic data volumes without duplicating your entire dataset.

Branches also serve as a lightweight point-in-time recovery mechanism. You can branch from any point within the retention window — 6 hours on the free plan, up to 7 days on the Launch plan, and up to 30 days on the Scale plan — which gives you a read-write copy of the database at that moment without touching the parent. This is faster than restoring a backup and produces a live database you can query immediately.

Autoscaling and Scale to Zero

Neon runs compute inside NeonVM - a custom Kubernetes CRD that manages lightweight VMs and supports live CPU and memory resizing without a restart. An autoscaler-agent daemon runs on each Kubernetes node, monitors compute metrics, and adjusts vCPU and memory allocations in-place. Scaling up happens within seconds. Scaling down is more conservative to avoid thrashing.

Scale to zero suspends the compute entirely after an inactivity period (fixed at 5 minutes on the free tier; on the Launch plan you can disable scale-to-zero entirely but cannot configure the threshold; on the Scale plan the threshold is fully configurable). Cold start - the time from first connection to serving a query - takes a few hundred milliseconds in most cases. That's acceptable for dev environments and low-frequency workloads, but it's a real problem for latency-sensitive applications. A background job that hits a suspended database at 3 AM will incur that startup cost. Long-polling connections, healthcheck probes, and connection pools that ping on idle can all keep the compute alive unintentionally, which adds to your compute bill.

Autoscaling is measured in Compute Units (CU): 1 CU = 1 vCPU + 4 GB RAM. The free tier includes 100 CU-hours per month (enough for approximately 400 hours/month at 0.25 CU — not enough for continuous 24/7 operation, which would require ~182 CU-hours; or a 1 CU instance for 100 hours). Paid plans bill per CU-hour consumed above included amounts: compute is priced at $0.106/CU-hour on the Launch plan and $0.222/CU-hour on the Scale plan.

Pricing: When It Works and When It Doesn't

Neon's pricing model is usage-based. Paid plans (Launch and Scale) have no minimum monthly fee — you pay only for what you use. Storage is billed at a flat $0.35/GB-month.

The economics work well for workloads with uneven traffic: hobby projects, development environments, per-tenant databases in multi-tenant SaaS, and preview environments that are idle most of the time. A database that runs 4 hours a day costs a fraction of a provisioned RDS instance that runs 24/7 for the same size.

The math inverts for continuously running, CPU-intensive workloads. Running a single 8 CU compute continuously consumes 5,760 CU-hours per month. At $0.106/CU-hour (Launch plan), that's roughly $611/month in compute alone before storage - and you lose the predictability of a flat instance price. A db.m6g.2xlarge RDS instance (8 vCPU, 32 GB) runs around $0.636/hour on-demand in us-east-1, or significantly less with a Reserved Instance commitment. For high-utilization production databases, that comparison can favor RDS substantially.

Connection pooling is built-in via PgBouncer, configured in transaction mode with a limit of up to 10,000 concurrent client connections. Direct (unpooled) connections are available but capped based on compute size, ranging from roughly 100 to 4,000 connections.

Limitations vs Managed Postgres

The PgBouncer transaction-mode pooling is a real constraint. Transaction mode does not support PostgreSQL session-level features: prepared statements (beyond max_prepared_statements=1000 which Neon sets, but implementations vary), LISTEN/NOTIFY, advisory locks that span transactions, and SET commands that need to persist across transactions. Tools like pg_dump rely on session-level SET statements and can fail or behave unexpectedly through the pooled connection. You need the direct (unpooled) connection string for those operations.

High availability is more limited than with Aurora or Cloud SQL. Neon's storage layer is inherently distributed and replicated through its Safekeeper quorum, so durability is solid. But compute-level HA - automatic failover to a standby in a different availability zone - is a newer feature and not as mature as Aurora's multi-AZ deployments, which have been production-hardened for years. Logical replication across regions is available today, so you can replicate data to a Neon database in another region. What is not available is automated infrastructure-level cross-region failover, which matters for disaster recovery requirements that need automatic promotion of a standby.

Regional coverage is narrower than RDS or Cloud SQL overall, but Neon does operate in APAC — including Singapore (aws-ap-southeast-1) and Sydney (aws-ap-southeast-2) — in addition to US and EU AWS and Azure regions. If you have data residency requirements outside the supported regions, you may still need to accept cross-region latency to the database.

For greenfield projects, internal tools, and development pipelines, Neon's branching workflow and scale-to-zero economics are genuinely useful. For a production Postgres database running consistent high-CPU transactional workloads with strict latency SLAs, the trade-offs - cold starts, transaction-mode pooling constraints, limited HA maturity, narrower regional footprint - need to be weighed carefully against the provisioned-instance alternatives.