ClickHouse DB::Exception: PostgreSQL replication internal error

The "DB::Exception: PostgreSQL replication internal error" indicates a failure in the MaterializedPostgreSQL replication engine within ClickHouse. The POSTGRESQL_REPLICATION_INTERNAL_ERROR code is raised when the logical replication stream from PostgreSQL encounters an unrecoverable problem — such as a lost replication slot, WAL segment removal, or an unexpected change in the source table schema.

Impact

When this error occurs, the MaterializedPostgreSQL table or database stops ingesting changes from PostgreSQL. Data in ClickHouse becomes stale and progressively diverges from the source. The replication slot on PostgreSQL may continue to hold WAL segments, eventually consuming disk space on the PostgreSQL server if the issue is not resolved.

Common Causes

The PostgreSQL replication slot has been dropped or invalidated.
WAL segments required by ClickHouse have been removed (e.g., due to wal_keep_size being too small or a manual pg_archivecleanup).
Schema changes on the PostgreSQL source table that are incompatible with the ClickHouse replica.
PostgreSQL wal_level is not set to logical.
The replication user lacks the REPLICATION privilege or pg_hba.conf does not allow replication connections.
Network interruptions that cause the replication connection to drop and the recovery process to fail.
ClickHouse restart during an active replication snapshot, corrupting the internal replication state.

Troubleshooting and Resolution Steps

Check the ClickHouse server log for the specific replication error details:

grep -i "POSTGRESQL_REPLICATION_INTERNAL_ERROR\|MaterializedPostgreSQL\|replication" /var/log/clickhouse-server/clickhouse-server.log | tail -40

Verify the PostgreSQL replication slot exists and is active:

-- Run on PostgreSQL
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_name LIKE '%clickhouse%';

Confirm wal_level is set to logical on PostgreSQL:
```
SHOW wal_level;
```
If it is not logical, change it in postgresql.conf and restart PostgreSQL:
```
wal_level = logical
```

Check that the replication user has the required privileges:

-- Run on PostgreSQL
SELECT rolname, rolreplication FROM pg_roles WHERE rolname = 'clickhouse_user';

If the replication slot was lost or WAL segments are missing, you need to re-sync the table. Drop and recreate the MaterializedPostgreSQL table or database in ClickHouse:

DROP DATABASE IF EXISTS materialized_pg;
CREATE DATABASE materialized_pg
ENGINE = MaterializedPostgreSQL('postgres-host:5432', 'source_db', 'clickhouse_user', 'password')
SETTINGS materialized_postgresql_tables_list = 'schema.table1,schema.table2';

Verify that pg_hba.conf allows replication connections:

host  replication  clickhouse_user  10.0.1.50/32  scram-sha-256

After restoring replication, monitor the replication lag:

SELECT database, table, is_readonly, absolute_delay
FROM system.replicas
WHERE database = 'materialized_pg';

Best Practices

Set wal_keep_size (or wal_keep_segments on older PostgreSQL versions) high enough to retain WAL data during maintenance windows or ClickHouse restarts.
Avoid running DDL changes on replicated PostgreSQL tables without first pausing and reconfiguring replication in ClickHouse.
Monitor replication slot lag on the PostgreSQL side to detect when ClickHouse falls behind.
Use a dedicated PostgreSQL replication user with minimal privileges beyond REPLICATION.
Test schema changes in a staging environment before applying them to production replicated tables.
Set up alerting on the ClickHouse side for replication errors to catch problems before the data drift becomes significant.

Frequently Asked Questions

Q: Can I recover MaterializedPostgreSQL replication without a full re-sync?
A: It depends on the cause. If the replication slot is intact and WAL segments are still available, ClickHouse may resume replication after a restart. However, if the slot is lost or WAL is missing, a full re-sync (drop and recreate) is required.

Q: How do I avoid WAL accumulation on PostgreSQL when ClickHouse replication is paused?
A: If you expect ClickHouse to be down for an extended period, consider dropping the replication slot on PostgreSQL to prevent unbounded WAL growth. You will need to perform a full re-sync when ClickHouse comes back.

Q: Does MaterializedPostgreSQL support DDL replication?
A: No. Schema changes on the PostgreSQL source are not automatically propagated. Adding or removing columns on the source table will likely cause a POSTGRESQL_REPLICATION_INTERNAL_ERROR. You must recreate the MaterializedPostgreSQL table after DDL changes.

Q: What PostgreSQL versions are supported for MaterializedPostgreSQL?
A: PostgreSQL 11 and later support logical replication, which is required for MaterializedPostgreSQL. Using PostgreSQL 14 or later is recommended for the best compatibility and logical replication features.