Using ClickHouse Views as Dictionary Sources

A ClickHouse dictionary normally reads from a single table or query. When the reference data you want to expose through one dictGet call is spread across several tables, you don't need a separate ETL step or a physical "denormalized" table. You can point the dictionary's CLICKHOUSE source at a VIEW that joins or unions those tables, and the dictionary loads the combined result as if it were a single table.

This guide shows how to build a dictionary on top of several tables using a view, how to choose between a TABLE and a QUERY source, and how to keep the dictionary fresh without rescanning everything. For background on dictionaries themselves, see ClickHouse Dictionary Data Structure and ClickHouse External Dictionaries.

Why Use a View as the Source

A dictionary source must present a flat set of columns: one key plus the attributes. If your reference data is normalized across tables — for example, English labels in one table and Russian labels in another, or base attributes in one table and overrides in a second — you have two ways to combine them at load time:

  • A VIEW referenced by name in the TABLE field of the source. The view encapsulates the JOIN/UNION logic, and the dictionary definition stays simple.
  • An inline QUERY in the source, where you write the combining SELECT directly inside SOURCE(CLICKHOUSE(...)).

Both feed the dictionary the same combined rows. A view is usually cleaner: the logic lives in one place, it's reusable, and the dictionary DDL reads like it points at a normal table. A standard (non-materialized) view stores no data — it is just a saved query that runs each time the dictionary reloads.

Joining Several Tables into One Dictionary

Suppose label data is split by language across two tables sharing the same id key. Create a view that joins them, then build the dictionary on that view:

CREATE TABLE dictionary_source_en
(
    id UInt64,
    value String
) ENGINE = TinyLog;

INSERT INTO dictionary_source_en VALUES (1, 'One'), (2, 'Two'), (3, 'Three');

CREATE TABLE dictionary_source_ru
(
    id UInt64,
    value String
) ENGINE = TinyLog;

INSERT INTO dictionary_source_ru VALUES (1, 'Один'), (2, 'Два'), (3, 'Три');

CREATE VIEW dictionary_source_view AS
SELECT
    id,
    dictionary_source_en.value AS value_en,
    dictionary_source_ru.value AS value_ru
FROM dictionary_source_en
LEFT JOIN dictionary_source_ru USING (id);

CREATE DICTIONARY flat_dictionary
(
    id UInt64,
    value_en String,
    value_ru String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' PASSWORD '' TABLE 'dictionary_source_view'))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(FLAT());

The dictionary's attribute list (value_en, value_ru) must match the columns the view exposes. Querying it is identical to querying any other dictionary:

SELECT
    dictGet(concat(currentDatabase(), '.flat_dictionary'), 'value_en', number + 1),
    dictGet(concat(currentDatabase(), '.flat_dictionary'), 'value_ru', number + 1)
FROM numbers(3);

A LEFT JOIN is the right tool here because it preserves every key from the left (driving) table even when the second table has no matching row. Make sure the join key is unique on the right side — if dictionary_source_ru had duplicate id values, the view would produce duplicate keys and the dictionary load would behave unexpectedly.

This pattern is the foundation of the Altinity Knowledge Base recipe "Dictionary on the top of several tables using VIEW," adapted to current ClickHouse behavior.

Stacking Tables with UNION ALL

If your tables hold disjoint sets of the same attribute shape (for example, reference data partitioned by region or by tenant), combine them by stacking rows rather than joining columns. Use UNION ALL inside the view:

CREATE VIEW reference_all AS
SELECT id, label, region FROM reference_emea
UNION ALL
SELECT id, label, region FROM reference_amer
UNION ALL
SELECT id, label, region FROM reference_apac;

CREATE DICTIONARY reference_dict
(
    id UInt64,
    label String,
    region String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(TABLE 'reference_all'))
LIFETIME(MIN 300 MAX 600)
LAYOUT(HASHED());

UNION ALL keeps all rows, so the combined key set must still be unique across the unioned tables — overlapping keys from two regions would collide. If the tables are physically identical in schema and you only want a union (no transformation), the Merge table engine is an alternative source: point the dictionary at a Merge table that fans out across the underlying tables. A view with explicit UNION ALL is more flexible when the columns need renaming or light transformation.

TABLE vs. QUERY in the CLICKHOUSE Source

The CLICKHOUSE source accepts either a TABLE (a table or view name) or a QUERY (an inline SELECT). They are alternatives — use one or the other.

Aspect TABLE 'view_name' QUERY 'SELECT ...'
Where the logic lives In a named CREATE VIEW object Inline in the dictionary DDL
Reusability View can be queried and reused elsewhere Logic is private to the dictionary
Readability of DDL Clean — looks like a plain table source Combining SQL embedded in the source
Best for Reused or non-trivial joins/unions One-off transformations local to the dictionary

The inline form looks like this:

CREATE DICTIONARY flat_dictionary
(
    id UInt64,
    value_en String,
    value_ru String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(
    QUERY 'SELECT id, en.value AS value_en, ru.value AS value_ru
           FROM dictionary_source_en AS en
           LEFT JOIN dictionary_source_ru AS ru USING (id)'
))
LIFETIME(MIN 1 MAX 1000)
LAYOUT(FLAT());

Functionally this is equivalent to the view-based example above. Prefer the named view when the combining logic is non-trivial or shared; prefer QUERY when the transformation is small and only this dictionary needs it.

Local source optimization

When the CLICKHOUSE source points at the same server (host localhost with the native port), ClickHouse reads the data internally without opening a real network connection, which makes self-referential dictionaries efficient. You can also omit HOST/PORT/USER/PASSWORD entirely for a same-server source, as in the UNION ALL example above.

Keeping the Dictionary Fresh

LIFETIME(MIN min MAX max) controls how often ClickHouse reloads the dictionary; it picks a random interval in that range (in seconds) to distribute the load on the dictionary source when updating on a large number of servers. On each reload, the source view re-runs its JOIN/UNION over the underlying tables.

For a view over several tables, a full reload re-executes the whole combining query, which can be expensive on large reference sets. Two source options reduce that cost:

  • invalidate_query — a small query whose result ClickHouse compares between reloads. If the result is unchanged, ClickHouse skips the reload entirely. Point it at something cheap that changes when the data does, such as SELECT max(updated_at) FROM ... over the driving table.
  • update_field — enables incremental updates for FLAT, HASHED, HASHED_ARRAY, and COMPLEX_KEY_HASHED layouts. ClickHouse appends a condition roughly equivalent to WHERE update_field >= last_update_time (with a small safety offset) so only changed rows are fetched, then merges them into the in-memory dictionary instead of rebuilding it.
SOURCE(CLICKHOUSE(
    TABLE 'dictionary_source_view'
    update_field 'updated_at'
    invalidate_query 'SELECT max(updated_at) FROM dictionary_source_en'
))

Note that update_field adds a filter to the source query, so the view (or the driving table behind it) must expose a usable updated_at column. invalidate_query only decides whether to reload; it does not by itself make the reload incremental. For a fuller treatment of refresh strategies, see ClickHouse dictionary enrichment and update patterns.

View, Materialized View, or a Physical Table?

A plain view computes the combined data lazily, every time the dictionary reloads. If that combining query is heavy and runs often, you have alternatives:

  • A materialized view writes the combined result into a target table as new data arrives, so the dictionary reads from a pre-built table instead of re-running the JOIN on every reload. This trades extra storage and insert-time work for cheaper reloads.
  • A periodically rebuilt physical table (refreshed by a scheduled INSERT ... SELECT) gives similar benefits with full control over timing.

Use a plain view when the underlying tables are small or reloads are infrequent; reach for a materialized view or a refreshed table when the dictionary is large and reload cost dominates.

Best Practices

  1. Guarantee key uniqueness in the combined result. Both LEFT JOIN (right-side duplicates) and UNION ALL (overlapping keys) can introduce duplicate keys. The dictionary key must be unique.
  2. Match attributes to the view's columns exactly. Names and types in the CREATE DICTIONARY attribute list must line up with what the view returns.
  3. Drive joins from the table with the full key set and LEFT JOIN the rest, so no keys are dropped when a side is missing data.
  4. Add invalidate_query for large sources so unchanged data doesn't trigger a full re-read on every LIFETIME interval.
  5. Use update_field with a hashed/flat layout when only a subset of rows changes between reloads, to avoid full rebuilds.
  6. Consider a materialized view or refreshed table when the combining query is expensive and the dictionary reloads frequently.

Common Issues

  1. Duplicate keys after a join. A non-unique join key on the right table multiplies rows. Verify with SELECT id, count() FROM dictionary_source_view GROUP BY id HAVING count() > 1.
  2. Attribute mismatch. If the view's column names or types don't match the dictionary's attribute declarations, the dictionary fails to load. Check SELECT * FROM system.dictionaries WHERE name = '...' for the last_exception column.
  3. Stale data despite a short LIFETIME. An invalidate_query that returns the same value will (correctly) suppress reloads — confirm it actually changes when the source data does.
  4. Reload latency on big views. A full reload re-runs the entire JOIN/UNION. If reloads are slow, switch to incremental updates (update_field) or precompute with a materialized view.
  5. Permission errors on the source. The dictionary's configured USER must be able to read every underlying table the view touches, not just the view object.

How Pulse Helps

Pulse monitors ClickHouse dictionaries as part of its health checks: it surfaces dictionaries stuck in a failed load state, flags reloads that run far longer than the configured LIFETIME window, and highlights sources whose invalidate_query never changes (a common cause of silently stale reference data). When a dictionary sits on top of a multi-table view, Pulse helps you see whether the combining query — and the tables behind it — are the bottleneck, so you can decide between tuning the refresh strategy and precomputing with a materialized view. It is run by ClickHouse experts who can advise on dictionary layout, refresh, and source design for your specific workload.

Frequently Asked Questions

Q: Can a single ClickHouse dictionary combine columns from more than one table?

Yes. Point the dictionary's CLICKHOUSE source at a VIEW (via TABLE) or an inline QUERY that joins or unions the tables. The dictionary loads the combined result as if it came from one table.

Q: Should I use a view or the inline QUERY parameter?

Use a named view when the combining logic is non-trivial or reused elsewhere — the dictionary DDL stays clean and the view is independently queryable. Use the inline QUERY when the transformation is small and specific to that one dictionary. They are functionally equivalent.

Q: JOIN or UNION ALL inside the view?

Use a JOIN to combine columns from tables that share a key (one row per key, more attributes). Use UNION ALL to stack rows from tables with the same shape (e.g., per-region tables). In both cases the resulting key must be unique.

Q: Does the view re-run on every dictionary reload?

Yes — a plain view is just a stored query, so its JOIN/UNION executes on each reload governed by LIFETIME. Use invalidate_query to skip reloads when nothing changed, update_field for incremental updates, or a materialized view to precompute the combined data.

Q: Can the dictionary source be a Merge-engine table instead of a view?

Yes, when you only need a union of identically structured tables and no transformation. A Merge table fans a read across multiple tables, and a dictionary can use it as its TABLE source. A view with explicit UNION ALL is preferable when columns need renaming or light transformation.

Q: Why is my view-backed dictionary loading duplicate or missing keys?

Duplicates usually come from a non-unique join key on the right side of a LEFT JOIN or overlapping keys across UNION ALL branches. Missing keys usually come from using an INNER JOIN where a LEFT JOIN (driven by the table holding all keys) was needed.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.