The "DB::Exception: PostgreSQL replication internal error" indicates a failure in the MaterializedPostgreSQL replication engine within ClickHouse. The POSTGRESQL_REPLICATION_INTERNAL_ERROR code is raised when the logical replication stream from PostgreSQL encounters an unrecoverable problem — such as a lost replication slot, WAL segment removal, or an unexpected change in the source table schema.
Impact
When this error occurs, the MaterializedPostgreSQL table or database stops ingesting changes from PostgreSQL. Data in ClickHouse becomes stale and progressively diverges from the source. The replication slot on PostgreSQL may continue to hold WAL segments, eventually consuming disk space on the PostgreSQL server if the issue is not resolved.
Common Causes
- The PostgreSQL replication slot has been dropped or invalidated.
- WAL segments required by ClickHouse have been removed (e.g., due to
wal_keep_sizebeing too small or a manualpg_archivecleanup). - Schema changes on the PostgreSQL source table that are incompatible with the ClickHouse replica.
- PostgreSQL
wal_levelis not set tological. - The replication user lacks the
REPLICATIONprivilege orpg_hba.confdoes not allow replication connections. - Network interruptions that cause the replication connection to drop and the recovery process to fail.
- ClickHouse restart during an active replication snapshot, corrupting the internal replication state.
Troubleshooting and Resolution Steps
Check the ClickHouse server log for the specific replication error details:
grep -i "POSTGRESQL_REPLICATION_INTERNAL_ERROR\|MaterializedPostgreSQL\|replication" /var/log/clickhouse-server/clickhouse-server.log | tail -40Verify the PostgreSQL replication slot exists and is active:
-- Run on PostgreSQL SELECT slot_name, active, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots WHERE slot_name LIKE '%clickhouse%';Confirm
wal_levelis set tologicalon PostgreSQL:SHOW wal_level;If it is not
logical, change it inpostgresql.confand restart PostgreSQL:wal_level = logicalCheck that the replication user has the required privileges:
-- Run on PostgreSQL SELECT rolname, rolreplication FROM pg_roles WHERE rolname = 'clickhouse_user';If the replication slot was lost or WAL segments are missing, you need to re-sync the table. Drop and recreate the MaterializedPostgreSQL table or database in ClickHouse:
DROP DATABASE IF EXISTS materialized_pg; CREATE DATABASE materialized_pg ENGINE = MaterializedPostgreSQL('postgres-host:5432', 'source_db', 'clickhouse_user', 'password') SETTINGS materialized_postgresql_tables_list = 'schema.table1,schema.table2';Verify that
pg_hba.confallows replication connections:host replication clickhouse_user 10.0.1.50/32 scram-sha-256After restoring replication, monitor the replication lag:
SELECT database, table, is_readonly, absolute_delay FROM system.replicas WHERE database = 'materialized_pg';
Best Practices
- Set
wal_keep_size(orwal_keep_segmentson older PostgreSQL versions) high enough to retain WAL data during maintenance windows or ClickHouse restarts. - Avoid running DDL changes on replicated PostgreSQL tables without first pausing and reconfiguring replication in ClickHouse.
- Monitor replication slot lag on the PostgreSQL side to detect when ClickHouse falls behind.
- Use a dedicated PostgreSQL replication user with minimal privileges beyond
REPLICATION. - Test schema changes in a staging environment before applying them to production replicated tables.
- Set up alerting on the ClickHouse side for replication errors to catch problems before the data drift becomes significant.
Frequently Asked Questions
Q: Can I recover MaterializedPostgreSQL replication without a full re-sync?
A: It depends on the cause. If the replication slot is intact and WAL segments are still available, ClickHouse may resume replication after a restart. However, if the slot is lost or WAL is missing, a full re-sync (drop and recreate) is required.
Q: How do I avoid WAL accumulation on PostgreSQL when ClickHouse replication is paused?
A: If you expect ClickHouse to be down for an extended period, consider dropping the replication slot on PostgreSQL to prevent unbounded WAL growth. You will need to perform a full re-sync when ClickHouse comes back.
Q: Does MaterializedPostgreSQL support DDL replication?
A: No. Schema changes on the PostgreSQL source are not automatically propagated. Adding or removing columns on the source table will likely cause a POSTGRESQL_REPLICATION_INTERNAL_ERROR. You must recreate the MaterializedPostgreSQL table after DDL changes.
Q: What PostgreSQL versions are supported for MaterializedPostgreSQL?
A: PostgreSQL 11 and later support logical replication, which is required for MaterializedPostgreSQL. Using PostgreSQL 14 or later is recommended for the best compatibility and logical replication features.