ClickHouse Materialized Views: TO Tables, POPULATE, and Refreshable MVs

Q: What's the difference between a materialized view and a regular view?

A regular view ( CREATE VIEW ) is a saved query, evaluated every time you SELECT from it. A materialized view stores its result in a target table that gets updated on every INSERT to the source. Regular views are essentially free; materialized views trade INSERT cost for SELECT speed.

Q: How do I update an MV's SELECT after it's been created?

ALTER TABLE MODIFY QUERY ... modifies the SELECT for new inserts. Existing data in the target table is unchanged. For a full backfill with the new logic, you have to recreate or run a manual rewrite.

Q: Can I use `FINAL` inside an MV's SELECT?

You can, but be careful. FINAL is expensive, and inside an MV it runs on every INSERT batch. For source tables that are ReplacingMergeTree or similar, the right pattern is often to design the MV around the raw rows and let the consumer query handle the FINAL semantics.

A materialized view in ClickHouse is not a stored snapshot of a query. It is an INSERT trigger: every time rows are inserted into the source table, the view's SELECT runs over those new rows and the result is written into the target table. This makes MVs the canonical way to build real-time aggregates, denormalized tables, Kafka-to-MergeTree pipelines, and per-tenant projections. The more recent REFRESH EVERY syntax adds the orthogonal capability of scheduled, full-table refreshes. This guide covers both patterns, the TO-target-table form versus the inline-engine form, the POPULATE keyword and why to avoid it, and the gotchas you hit only when you start chaining MVs together.

The Quick Answer

-- Target table that will hold the aggregates
CREATE TABLE analytics.events_per_hour
(
    hour       DateTime,
    event_type LowCardinality(String),
    cnt        UInt64
)
ENGINE = SummingMergeTree
ORDER BY (hour, event_type);

-- The view that wires source -> target
CREATE MATERIALIZED VIEW analytics.events_per_hour_mv
TO analytics.events_per_hour
AS
SELECT
    toStartOfHour(event_time) AS hour,
    event_type,
    count() AS cnt
FROM analytics.events
GROUP BY hour, event_type;

Every INSERT into analytics.events produces rows into analytics.events_per_hour automatically. The view itself stores nothing; the target table is where the data lives.

Two Forms: TO vs Inline Engine

There are two ways to define an MV's storage:

TO target_table (recommended)

CREATE MATERIALIZED VIEW my_mv
TO my_target_table
AS SELECT ... FROM source;

The target table is a normal table you create separately. This is the right pattern for production for several reasons:

You can INSERT directly into the target table (useful for backfills).
You can drop the view and the data survives.
You have full control over the target's engine, ORDER BY, TTL, replication.
The target schema is explicit; reading the DDL tells you what you have.

Inline engine (legacy)

CREATE MATERIALIZED VIEW my_mv
ENGINE = SummingMergeTree
ORDER BY (hour, event_type)
AS SELECT ... FROM source;

ClickHouse creates a hidden .inner_id.<uuid> table to hold the data. The view and its storage are bound together. Dropping the view drops the data.

This shorter form is fine for quick experiments. For anything in production, use the TO form. The docs go further and explicitly disallow combining TO and POPULATE: "When creating a materialized view with TO [db].[table], you can't also use POPULATE."

POPULATE: What It Does and Why to Skip It

CREATE MATERIALIZED VIEW my_mv
ENGINE = SummingMergeTree
ORDER BY (...)
POPULATE
AS SELECT ... FROM source;

POPULATE runs the SELECT once over existing source data at view creation and inserts the result. Useful in theory, problematic in practice. The docs warn: "We do not recommend using POPULATE, since data inserted in the table during the view creation will not be inserted in it." Also: "It is not supported in ClickHouse Cloud" and not supported with the Replicated database engine.

The safer pattern:

Create the target table.
Create the materialized view (it starts capturing new inserts immediately).
Backfill the target separately with INSERT INTO target SELECT ... FROM source WHERE event_time < <view_creation_time>.

This way you control the backfill semantics, can throttle it, can restart it on failure, and never have a window where the MV silently dropped rows.

Refreshable Materialized Views

The newer REFRESH EVERY syntax produces views that periodically run a full query and replace (or append to) the target table:

CREATE MATERIALIZED VIEW analytics.daily_summary
REFRESH EVERY 1 HOUR
[OFFSET 5 MINUTE]
[RANDOMIZE FOR 30 SECOND]
[APPEND]
TO analytics.daily_summary_table
AS
SELECT
    toDate(event_time) AS day,
    event_type,
    count() AS cnt,
    uniq(user_id) AS users
FROM analytics.events
WHERE event_time >= today() - 7
GROUP BY day, event_type;

Refreshable MVs are the right tool when:

The aggregate is over a large window and an incremental MV would not be exact (e.g. percentile over the last 24 hours).
The source data is mutable or comes from an external system you cannot trigger off.
The cost of re-running the query once an hour is acceptable.

Default behavior is to atomically replace the target on each refresh. APPEND mode "inserts rows into the table without deleting existing rows," useful for accumulating snapshots. RANDOMIZE FOR jitters the schedule to avoid every refreshable view firing at once. For dependent refreshable views, use DEPENDS ON so dependents wait for upstream refreshes to finish.

Common Production Patterns

Streaming aggregation

CREATE TABLE analytics.events (event_time DateTime, user_id UInt64, event_type String, ...) ENGINE = MergeTree ORDER BY (user_id, event_time);

CREATE TABLE analytics.events_per_user_day
(
    day Date, user_id UInt64,
    sessions AggregateFunction(uniq, UInt64),
    total_events AggregateFunction(count)
)
ENGINE = AggregatingMergeTree
ORDER BY (day, user_id);

CREATE MATERIALIZED VIEW analytics.events_per_user_day_mv
TO analytics.events_per_user_day
AS
SELECT
    toDate(event_time) AS day,
    user_id,
    uniqState(session_id) AS sessions,
    countState() AS total_events
FROM analytics.events
GROUP BY day, user_id;

Read with the matching Merge combinator: uniqMerge(sessions), countMerge(total_events). See the MergeTree guide for the AggregatingMergeTree details.

Kafka ingestion

CREATE TABLE analytics.events_kafka (...) ENGINE = Kafka SETTINGS kafka_broker_list = '...', kafka_topic_list = 'events', kafka_format = 'JSONEachRow';

CREATE TABLE analytics.events (...) ENGINE = MergeTree ORDER BY (...);

CREATE MATERIALIZED VIEW analytics.events_consumer
TO analytics.events
AS SELECT * FROM analytics.events_kafka;

The Kafka engine table reads from Kafka; the MV pumps rows into the MergeTree storage table. Drop the MV to pause ingestion; recreate to resume. This is the canonical Kafka pattern.

Denormalized projection

CREATE MATERIALIZED VIEW analytics.events_enriched
TO analytics.events_enriched_table
AS
SELECT
    e.*,
    u.country,
    u.tier
FROM analytics.events AS e
LEFT JOIN analytics.users AS u USING user_id;

Build a wider table for fast lookups without doing the JOIN at query time. Caveat: the JOIN runs on each INSERT batch, against the state of users at that moment. If users change later, the enriched rows are not retroactively updated.

Chained materialized views

-- Raw events
CREATE TABLE events (...) ENGINE = MergeTree ...;

-- First-level aggregate
CREATE TABLE events_per_hour (...) ENGINE = SummingMergeTree ...;
CREATE MATERIALIZED VIEW events_per_hour_mv TO events_per_hour AS
SELECT toStartOfHour(event_time) AS hour, count() FROM events GROUP BY hour;

-- Second-level aggregate from the first
CREATE TABLE events_per_day (...) ENGINE = SummingMergeTree ...;
CREATE MATERIALIZED VIEW events_per_day_mv TO events_per_day AS
SELECT toDate(hour) AS day, sum(c) FROM events_per_hour GROUP BY day;

Chained MVs let you build a tiered rollup pipeline. The thing to know: each MV sees the raw INSERT to its source, not the merged result. The first MV sees one row per insert; the second MV sees that row arrive into events_per_hour, before any merges have collapsed the SummingMergeTree. Design accordingly.

Common Pitfalls

MVs do not see UPDATEs or DELETEs. They are INSERT triggers only. Mutations on the source do not propagate. If you need to track mutations, use a ReplacingMergeTree pattern or a refreshable MV instead.
POPULATE has gaps. Inserts during POPULATE are lost. Always use the backfill-separately pattern.
JOINs against external tables freeze at INSERT time. The right-side of a JOIN is evaluated when the source rows are inserted. Later updates to the JOINed table do not refresh enriched rows.
Dropping the source table. If a regular MV's source is dropped, the MV breaks. With the TO form, the target table is independent; with the inline form, the .inner table goes with the view.
Replicated MVs on Replicated databases. Inside a Replicated database, both the MV and its target must be defined inside the database. Cross-database refs across Replicated boundaries are not supported.

Pre-Aggregation: ETL vs Materialized Views vs Projections

A materialized view is one of three ways to pre-compute aggregates in ClickHouse. The other two are an external ETL pipeline that writes a rollup table on a schedule, and a projection attached to the base table. They overlap enough that the choice is often unclear, so it helps to compare them on the axes that actually matter.

	External ETL	Materialized View	Projection
Update timing	Scheduled batch	On every INSERT (incremental)	Background, atomic with the part
Query routing	You query the rollup table explicitly	You query the target table explicitly	Transparent — query the base table, ClickHouse picks the projection
Aggregation complexity	Arbitrary	Complex (full SELECT, GROUP BY, expressions)	Simple aggregations only, single table
JOINs / multiple sources	Yes	Yes	No (single base table only)
Insert overhead	None on the cluster	Per-INSERT SELECT cost	Extra sort + write per part
Consistency on failure	Depends on your pipeline	No transactional guarantee; gaps possible	Atomic with the part; no inconsistency
Engines	Any	Any source engine	MergeTree family only

The decision usually comes down to three questions:

Does it need a JOIN, multiple sources, or non-trivial transformation? Only an MV or external ETL can do this; projections cannot reference other tables. A denormalized, pre-joined table is an MV use case.
Is the aggregation simple and over a single table? A projection is the lowest-friction choice: no separate target table to manage, queries route automatically, and it stays consistent with the base data because it lives inside the same parts. Reach for a projection before an MV when all you want is an alternative sort order or a single-table SummingMergeTree-style rollup.
Is the source mutable, or the logic too complex for the incremental pattern? Incremental MVs only see INSERTs (not UPDATEs/DELETEs) and aggregate per insert batch, so percentiles, distinct counts over wide windows, or rollups on ReplacingMergeTree/CollapsingMergeTree sources are awkward. A scheduled refreshable MV or an external ETL job that recomputes the window is more correct here.

These are not mutually exclusive. A common production layout uses an MV for the streaming first-level rollup, an AggregatingMergeTree target to hold partial states (see AggregatingMergeTree and the MergeTree guide), and a projection on the raw table for an unrelated filter pattern. ClickHouse's own guidance is to treat MVs and projections as complementary rather than as an either/or choice.

One subtlety on the storage engine: an MV's target is just a table, so you get the pre-aggregation behavior from the target engine — SummingMergeTree or AggregatingMergeTree collapse the rows the MV produces. A projection with a GROUP BY does the equivalent internally. If you only need the engine-level rollup and nothing fancier, a plain SummingMergeTree table fed by `INSERT INTO ... SELECT` can be simpler than either.

How Pulse Helps With Materialized View Health

Materialized views are the most common source of "silently incorrect" data on ClickHouse clusters. POPULATE gaps, broken JOINs after schema changes, target tables that ran out of disk, and Kafka MVs that crashed and never restarted all produce missing rows without raising loud errors. Pulse continuously tracks MV health across ClickHouse clusters and surfaces views whose target counts have diverged from expected, Kafka-fed MVs that stopped consuming, refreshable views that missed a refresh, and target tables whose sizes are growing in patterns that don't match the source. Connect your ClickHouse cluster to Pulse and let it watch the long tail of MV correctness.

Frequently Asked Questions

Q: Does a materialized view see existing data when I create it?

Not by default. The MV fires on subsequent inserts only. Use POPULATE to backfill existing data (with the caveats above), or do a one-shot INSERT INTO target SELECT FROM source manually after creating the MV.

Q: What's the difference between a materialized view and a regular view?

A regular view (CREATE VIEW) is a saved query, evaluated every time you SELECT from it. A materialized view stores its result in a target table that gets updated on every INSERT to the source. Regular views are essentially free; materialized views trade INSERT cost for SELECT speed.

Q: Can I have multiple MVs on the same source table?

Yes. Each MV runs its SELECT independently for every INSERT. The cost is additive, so be deliberate about how many heavy aggregates you fan out from one source.

Q: Should I use the inline engine or TO syntax?

TO. Always TO, in production. The inline form ties the data to the view definition and complicates everything from drop to backfill to migrations.

Q: How do I update an MV's SELECT after it's been created?

ALTER TABLE <mv_name> MODIFY QUERY ... modifies the SELECT for new inserts. Existing data in the target table is unchanged. For a full backfill with the new logic, you have to recreate or run a manual rewrite.

Q: Are refreshable MVs replacing regular MVs?

No, they are complementary. Regular MVs are right for streaming aggregates over append-only data. Refreshable MVs are right for periodic full recomputations, often where the SELECT is too complex for the incremental pattern or where the source is external.

Q: Should I use a materialized view or a projection for pre-aggregation?

Use a projection when the rollup is a simple aggregation over a single table and you want queries to route automatically without managing a separate target table — projections stay atomically consistent with the base data. Use a materialized view when you need JOINs, multiple sources, complex transformations, or you are building a tiered/chained rollup pipeline. The two are complementary; see Pre-Aggregation: ETL vs Materialized Views vs Projections and the projections guide for the full comparison.

Q: Can I use FINAL inside an MV's SELECT?

You can, but be careful. FINAL is expensive, and inside an MV it runs on every INSERT batch. For source tables that are ReplacingMergeTree or similar, the right pattern is often to design the MV around the raw rows and let the consumer query handle the FINAL semantics.