ClickHouse dictionaries let you keep reference data — user names, product catalogs, geo lookups — in an external system and query it with fast in-memory dictGet calls instead of JOINs. PostgreSQL is one of the most common sources: you keep the authoritative table in Postgres and ClickHouse periodically pulls it into a dictionary.
This guide covers the practical end-to-end setup: defining a dictionary with a PostgreSQL source, choosing a layout, controlling refresh, handling authentication and replicas, and the issues that come up most often. For conceptual background, see What is a ClickHouse dictionary and ClickHouse external dictionaries.
Prerequisites
The PostgreSQL dictionary source ships in the standard ClickHouse server build — no extra package is required (unlike ODBC sources). You need:
- A reachable PostgreSQL instance and a role with
SELECTon the source table. - Network connectivity from every ClickHouse node that will load the dictionary. In a cluster, each replica loads the dictionary independently, so every node must be able to reach Postgres.
- The dictionary attributes you want must exist as columns in the Postgres table (or be produced by a custom
query).
Creating a Dictionary with a PostgreSQL Source
The DDL form (CREATE DICTIONARY) is the modern, recommended approach. A dictionary has four parts: the attribute schema with a PRIMARY KEY, a SOURCE, a LAYOUT, and a LIFETIME.
CREATE DICTIONARY customers_dict
(
customer_id UInt64,
name String,
country String,
tier String DEFAULT 'standard'
)
PRIMARY KEY customer_id
SOURCE(POSTGRESQL(
host 'postgres.internal'
port 5432
user 'clickhouse_ro'
password 'secret'
db 'crm'
table 'customers'
))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 600);
Once created, look up values with dictGet:
SELECT
order_id,
dictGet('customers_dict', 'name', customer_id) AS customer_name,
dictGet('customers_dict', 'country', customer_id) AS country
FROM orders;
dictGet('dict_name', 'attr', id_expr) returns the attribute for the given key. The key expression must match the key type — for a UInt64 key, wrap raw integers with toUInt64(...) when needed. Related helpers:
dictGetOrDefault('customers_dict', 'tier', toUInt64(42), 'unknown')— return a fallback when the key is missing instead of the attribute's declared default.dictHas('customers_dict', toUInt64(42))— returns1if the key exists,0otherwise.
PostgreSQL Source Parameters
These parameters go inside SOURCE(POSTGRESQL(...)):
| Parameter | Purpose |
|---|---|
host / port |
PostgreSQL server address (default port 5432). |
user / password |
Credentials for a role with SELECT on the source. |
db |
Database name. |
table |
Source table name. Mutually exclusive with query. |
where |
Optional WHERE clause to filter source rows (cannot be combined with query). |
query |
A custom SELECT instead of table/where. Either table or query is required. |
invalidate_query |
A SELECT returning a single value; the dictionary reloads only when that value changes. |
replica(...) |
One or more replica blocks with their own host, port, and priority. |
background_reconnect |
Optional; reconnect to a replica automatically after a failure. |
A custom query is useful when you want to join or transform in PostgreSQL, or rename columns to match your dictionary attributes:
SOURCE(POSTGRESQL(
host 'postgres.internal' port 5432
user 'clickhouse_ro' password 'secret'
db 'crm'
query 'SELECT id AS customer_id, full_name AS name, country_code AS country FROM customers WHERE active = true'
))
Note: table and where cannot be combined with query — pick one approach.
Choosing a Layout
The LAYOUT controls how the dictionary is held in memory and which key types are allowed. This choice matters more for performance than anything in the source definition.
| Layout | Key | Use when |
|---|---|---|
FLAT |
Single UInt64 |
Keys are dense, small integers. Fastest lookups, backed by a flat array. Key values must not exceed max_array_size (default 500,000); ClickHouse throws an exception and refuses to load if any key exceeds this limit. |
HASHED |
Single UInt64 |
Sparse or large integer keys. The general-purpose default. |
SPARSE_HASHED |
Single UInt64 |
Same as hashed but trades some CPU for lower memory. |
COMPLEX_KEY_HASHED |
Composite / non-integer | Keys are strings, tuples, or multiple columns. |
HASHED_ARRAY |
Single UInt64 |
Many attributes; more memory-efficient than HASHED. |
RANGE_HASHED |
UInt64 + range |
Lookups valid for a date/time range (e.g. price history). |
CACHE / COMPLEX_KEY_CACHE |
Single / composite | Dataset too large for memory; only hot keys cached. Tuning is hard — avoid unless necessary. |
DIRECT / COMPLEX_KEY_DIRECT |
Single / composite | No in-memory storage; every lookup hits PostgreSQL. Use only for always-fresh, low-volume lookups. |
IP_TRIE |
CIDR prefixes | IP-prefix / CIDR lookups. |
For a String key from Postgres, use COMPLEX_KEY_HASHED and declare the key accordingly:
CREATE DICTIONARY country_codes_dict
(
code String,
name String,
region String
)
PRIMARY KEY code
SOURCE(POSTGRESQL(
host 'postgres.internal' port 5432
user 'clickhouse_ro' password 'secret'
db 'reference' table 'countries'
))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 600 MAX 900);
Look up complex keys by passing the key as a tuple: dictGet('country_codes_dict', 'name', tuple('US')).
Controlling Refresh with LIFETIME
LIFETIME(MIN min MAX max) sets the refresh window in seconds. ClickHouse picks a random reload time between MIN and MAX for each node, which spreads load so all replicas don't hit PostgreSQL simultaneously. LIFETIME(300) is shorthand for a fixed interval.
LIFETIME(MIN 300 MAX 600) -- reload sometime between 5 and 10 minutes
LIFETIME(0) -- never reload automatically
A blind reload re-reads the whole source. To avoid unnecessary full reloads, add an invalidate_query that returns a cheap change marker (a max timestamp or a row count). ClickHouse reloads only when the returned value differs from the previous run:
SOURCE(POSTGRESQL(
host 'postgres.internal' port 5432
user 'clickhouse_ro' password 'secret'
db 'crm' table 'customers'
invalidate_query 'SELECT max(updated_at) FROM customers'
))
To force an immediate reload regardless of LIFETIME:
SYSTEM RELOAD DICTIONARY customers_dict;
For more on keeping enrichment data current, see ClickHouse dictionary enrichment and update patterns.
High Availability with Replicas
If your PostgreSQL setup has read replicas, list them with priorities. ClickHouse tries replicas in priority order — lower number means higher priority — and falls back to the next one on failure.
SOURCE(POSTGRESQL(
port 5432
user 'clickhouse_ro' password 'secret'
db 'crm' table 'customers'
replica(host 'pg-primary' port 5432 priority 1)
replica(host 'pg-replica-1' port 5432 priority 2)
background_reconnect true
))
Authentication and Security
- Use a dedicated read-only role. Grant only
SELECTon the source table; the dictionary never writes back. - Credentials live in the dictionary definition. With DDL dictionaries the password is stored in ClickHouse metadata. Restrict who can run
SHOW CREATE DICTIONARYand consider XML config files with locked-down filesystem permissions if you need the credential out of SQL. - SSL/TLS: terminate Postgres connections over TLS at the server side; configure
sslmodeon the PostgreSQL server and require encrypted connections viapg_hba.conf. - Network: every ClickHouse node needs reachability to Postgres. Connection or auth failures surface as load errors — see ClickHouse error: PostgreSQL connection failure and dictionary access denied.
Dictionary vs. postgresql() Table Function vs. PostgreSQL Engine
A dictionary is not the only way to read Postgres from ClickHouse. Choose based on access pattern:
| Approach | Best for | Trade-off |
|---|---|---|
| Dictionary | High-frequency key lookups / enrichment in hot queries | Loaded into memory; refreshed on a schedule, so slightly stale |
postgresql() table function |
Ad-hoc one-off reads of a Postgres table | Reads live each time; no caching, full network round trip |
| PostgreSQL table engine | A persistent table mapping you query like any table | Live reads on every query; not optimized for point lookups |
If your workload is "given an ID, get the name" repeated millions of times per query, a dictionary is almost always the right tool. If you need always-live data and can tolerate per-query latency, the DIRECT layout or the table function fits better.
Common Issues
Dictionary loads but is empty. Usually the
table/schema/dbcombination points at the wrong place, or thewhereclause filters everything out. CheckSELECT status, last_exception FROM system.dictionaries WHERE name = 'customers_dict'. See ClickHouse dictionary is empty.queryandtableboth set. They are mutually exclusive; ClickHouse rejects the definition. See incorrect dictionary definition.Stale data. Either
LIFETIMEis too long, or aninvalidate_querynever changes. Verify the marker actually moves when source rows change, or runSYSTEM RELOAD DICTIONARY.Connection / auth failures on load. Wrong credentials, missing
SELECTgrant, or a firewall blocking a node. See PostgreSQL connection failure.Cache-layout update failures. With
CACHElayouts, individual key misses that can't reach the source produce update errors. See cache dictionary update fail.Type mismatches. A Postgres
bigintmaps toUInt64/Int64; atexttoString. Mismatched declared types cause load errors — align the dictionary attribute types with the Postgres column types.
Best Practices
- Prefer DDL over XML for new dictionaries — it's versionable through SQL and easier to manage.
- Pick the layout deliberately:
FLATfor dense integer keys,HASHEDfor sparse ones,COMPLEX_KEY_HASHEDfor string/composite keys. AvoidCACHEandDIRECTunless you specifically need them. - Add an
invalidate_queryso reloads happen only on real changes, not on everyLIFETIMEtick. - Use a read-only Postgres role scoped to the source table.
- Filter at the source with
whereor a customqueryto keep the dictionary small and the memory footprint low. - Monitor
system.dictionariesforstatus,last_exception,loading_duration, andbytes_allocated.
How Pulse Helps
Pulse monitors ClickHouse in production and surfaces dictionary problems before they corrupt query results. It tracks system.dictionaries for failed loads, growing loading_duration, repeated reload errors, and dictionaries stuck in a not-loaded state — including PostgreSQL-sourced ones that silently go stale when a LIFETIME or invalidate_query is misconfigured. It also watches the connection and memory side effects of large dictionaries across the cluster. Learn more at pulse.support.
Frequently Asked Questions
Q: Do I need to install anything to use a PostgreSQL dictionary source?
No. The PostgreSQL source is part of the standard ClickHouse server. Unlike ODBC sources, no extra driver package is required.
Q: How often does the dictionary refresh from PostgreSQL?
It reloads at a random time within the LIFETIME(MIN ... MAX ...) window, per node. Add an invalidate_query to skip reloads when nothing changed, or run SYSTEM RELOAD DICTIONARY to force one.
Q: My key is a string, not a number. Which layout do I use?
Use COMPLEX_KEY_HASHED (or COMPLEX_KEY_CACHE/COMPLEX_KEY_DIRECT). Plain FLAT/HASHED require a single UInt64 key. Look up complex keys by passing a tuple to dictGet.
Q: Should I use a dictionary or the postgresql() table function?
Use a dictionary for repeated, high-volume key lookups (enrichment in hot queries), where in-memory caching pays off. Use the postgresql() table function for ad-hoc or one-off reads where you want live data and don't need caching.
Q: How do I see why a dictionary failed to load?
Query SELECT name, status, last_exception, loading_duration FROM system.dictionaries. The last_exception column usually contains the underlying PostgreSQL or connection error.
Q: Can I filter or transform the PostgreSQL data before it becomes a dictionary?
Yes. Use the where parameter for a simple filter, or a full custom query to select, rename, or join columns in PostgreSQL before they load. query cannot be combined with table/where.