Amazon Aurora PostgreSQL: Architecture, Features, and Trade-offs

Aurora PostgreSQL sits in an interesting position: it presents a standard PostgreSQL interface but replaces the entire storage layer underneath. For most teams coming from RDS PostgreSQL, the surface looks familiar. The differences that matter are architectural, and they have downstream effects on performance, operational behavior, and cost.

How Aurora's Storage Engine Actually Works

Standard RDS PostgreSQL writes data pages to EBS volumes attached to the instance. Aurora discards that model entirely. Instead, it separates compute from storage and routes all writes to a distributed storage fleet spanning six storage nodes across three Availability Zones. Each write quorum requires four of six nodes to acknowledge before the database considers the write durable.

The key difference from traditional replication is what gets sent across the network. Aurora sends only redo log records to the storage layer, not full data pages. The storage nodes themselves apply the log and materialize the pages. This dramatically reduces write amplification - a problem that haunts PostgreSQL's double-write mechanism on EBS. The result is that the writer instance produces far fewer I/O operations for the same transaction throughput.

Read replicas in Aurora share the same underlying storage volume as the writer. There is no replication of data between instances - readers simply receive log pointers and apply them to their local buffer cache state. This means replication lag between writer and readers is typically under 100ms and adding a reader does not impose additional write overhead on the primary. Aurora supports up to 15 in-region Aurora Replicas per cluster. RDS for PostgreSQL also supports up to 15 direct read replicas, and with cascaded replicas (PostgreSQL 14 and later) can fan out further beyond that limit.

Storage scales automatically from 10 GiB increments up to 256 TiB (supported on Aurora PostgreSQL 17.5, 16.9, 15.13, 14.18, 13.21 and higher), with no pre-provisioning required. You pay for what you use, measured in GB-months. One important subtlety: deleted rows don't reclaim storage as quickly as you might expect. Aurora frees space incrementally in the background, so a large bulk delete won't immediately reduce your bill.

Aurora-Specific Features Worth Knowing

Global Database

Aurora Global Database replicates at the storage layer across AWS regions, typically achieving under 1 second of replication lag to secondary regions. Unlike logical replication, the mechanism is physical - dedicated replication servers in Aurora's storage layer handle the transfer independently of the compute instances. Secondary regions are read-only by default, but write forwarding lets applications send writes to a secondary that Aurora proxies back to the primary region, with the obvious latency penalty.

For disaster recovery, a planned switchover achieves RPO of zero since it fully synchronizes the secondary before promoting it. An unplanned failover (region failure) gives you RPO in seconds, not zero, because of the asynchronous storage lag at the moment of failure. RTO for promotion is typically under a minute. If you're running multi-region active/passive and DR is your primary motivation, this architecture is solid. If you want true active-active writes across regions, Aurora Global Database is not the right tool - that requires a different product category entirely.

Aurora Serverless v2

Serverless v2 replaces the original Serverless v1 and is a substantially different product. It scales in increments of 0.5 ACU (Aurora Capacity Units), where 1 ACU corresponds to approximately 2 GiB of RAM plus proportional CPU. Minimum capacity can be set as low as 0 ACU (scale to zero) or 0.5 ACU. Scale to zero is useful for development environments; idle resume takes up to 15 seconds, which is not acceptable for latency-sensitive production paths.

One operational issue to plan for: if you have Serverless v2 readers behind a writer, the readers don't automatically scale in lockstep with the writer unless their promotion tier is 0 or 1. A misconfigured reader with a low minimum ACU while the writer is handling heavy traffic will accumulate replication lag. AWS recommends matching reader minimum capacity to a value that reflects the writer's expected memory footprint.

The billing model for Serverless v2 is per-ACU-second. In us-east-1, Serverless v2 capacity runs around $0.12 per ACU-hour. If your workload has highly variable traffic with long idle periods, Serverless v2 can meaningfully reduce costs versus a fixed provisioned instance. For steady high-throughput workloads, provisioned instances with predictable pricing are typically cheaper.

Fast Clone

Aurora fast clone creates a new cluster that initially shares the underlying storage pages with the source cluster using copy-on-write semantics. The clone operation itself is near-instant regardless of database size - you're not copying data, just creating a new cluster that references the same pages. Pages diverge only as writes happen on either side. This makes cloning practical for creating production replicas for staging, testing schema migrations, or debugging production issues against a live copy of data, all without a multi-hour snapshot restore.

Performance vs RDS PostgreSQL

The throughput gap between Aurora and RDS PostgreSQL narrows considerably at smaller instance sizes and for simple workloads. AWS's "3x throughput" claim applies to write-heavy workloads where Aurora's log-only write path gives it a structural advantage. For read-heavy workloads with good buffer cache hit rates, both systems perform comparably because neither is I/O bound.

Where Aurora shows measurable gains is under sustained write load on larger instances, particularly workloads with high checkpoint pressure on RDS. Aurora doesn't have traditional PostgreSQL checkpoints in the same way - the storage layer handles durability, so the compute layer avoids the I/O spikes that come from checkpoint flushes on EBS. At equivalent instance sizes, RDS PostgreSQL with io1 or gp3 storage can reach around 40,000-50,000 writes per second under optimal conditions. Aurora can sustain higher rates without the same latency variance.

Read replica lag is another area where Aurora has a structural advantage. Since replicas share storage with the writer, replica lag is not a function of replication throughput - it's a function of how quickly the replica's buffer cache catches up with log apply. Under write-heavy load, RDS replicas can fall seconds or minutes behind. Aurora replicas stay within milliseconds in most cases.

Compatibility and Limitations

Aurora PostgreSQL tracks community PostgreSQL releases but with a delay. Minor version releases follow a quarterly cadence. Major versions typically arrive 3-5 months after the community release. For teams that depend on bleeding-edge PostgreSQL features, this lag matters.

Extension support is more constrained than a self-managed PostgreSQL installation. Aurora runs on a managed OS where you cannot load arbitrary shared libraries. Extensions requiring C-level hooks or superuser access are either unavailable, restricted, or provided in modified form. pg_repack is well-supported on Aurora PostgreSQL; AWS publishes a dedicated guide ("Remove bloat from Amazon Aurora and RDS for PostgreSQL with pg_repack") covering its use on the platform. Extensions requiring OS-level access like file_fdw don't work. You also lose the ability to configure kernel parameters or tune shared_preload_libraries beyond what AWS exposes via parameter groups.

Logical replication has a significant footgun: before any major version upgrade, you must drop all replication slots on the cluster, including inactive ones. Aurora does not preserve replication slots across major upgrades. If you're running Aurora as a publisher for downstream consumers (Debezium, pglogical, etc.), a major version upgrade requires coordinating slot teardown and re-establishment on the consumer side - a non-trivial operational event for change data capture pipelines.

Cost Considerations

Aurora's pricing model has two main storage configurations: Standard and I/O-Optimized. Standard charges separately for storage (~~$0.10/GB-month in us-east-1) and I/O operations (~~$0.20 per million requests). I/O-Optimized charges more for storage (~$0.225/GB-month) and bumps instance costs by approximately 30%, but eliminates per-I/O charges entirely. AWS's guidance is to switch to I/O-Optimized when I/O costs exceed 25% of your total Aurora bill. For write-heavy production workloads that use Aurora's throughput advantages, this breakeven often hits at moderate-to-high utilization.

Compared to RDS PostgreSQL, Aurora's compute costs at equivalent instance sizes are similar. Storage costs tend to run higher on Aurora for small databases (you pay for the distributed redundancy) but lower for very large datasets where RDS requires provisioned IOPS volumes to match the I/O performance Aurora provides by default. Global Database adds per-region replication costs and cross-region data transfer charges on top of per-region storage and compute.

The practical cost decision: Aurora makes economic sense when you need read replicas, want to avoid managing EBS volume sizing and IOPS provisioning, or have workloads where write throughput or replica lag consistency matters. For a single-instance PostgreSQL database with a modest workload and no read replicas, RDS PostgreSQL on gp3 is cheaper and simpler to operate.