A materialized view in ClickHouse is not a stored snapshot of a query. It is an INSERT trigger: every time rows are inserted into the source table, the view's SELECT runs over those new rows and the result is written into the target table. This makes MVs the canonical way to build real-time aggregates, denormalized tables, Kafka-to-MergeTree pipelines, and per-tenant projections. The more recent REFRESH EVERY syntax adds the orthogonal capability of scheduled, full-table refreshes. This guide covers both patterns, the TO-target-table form versus the inline-engine form, the POPULATE keyword and why to avoid it, and the gotchas you hit only when you start chaining MVs together.
The Quick Answer
-- Target table that will hold the aggregates
CREATE TABLE analytics.events_per_hour
(
hour DateTime,
event_type LowCardinality(String),
cnt UInt64
)
ENGINE = SummingMergeTree
ORDER BY (hour, event_type);
-- The view that wires source -> target
CREATE MATERIALIZED VIEW analytics.events_per_hour_mv
TO analytics.events_per_hour
AS
SELECT
toStartOfHour(event_time) AS hour,
event_type,
count() AS cnt
FROM analytics.events
GROUP BY hour, event_type;
Every INSERT into analytics.events produces rows into analytics.events_per_hour automatically. The view itself stores nothing; the target table is where the data lives.
Two Forms: TO vs Inline Engine
There are two ways to define an MV's storage:
TO target_table (recommended)
CREATE MATERIALIZED VIEW my_mv
TO my_target_table
AS SELECT ... FROM source;
The target table is a normal table you create separately. This is the right pattern for production for several reasons:
- You can
INSERTdirectly into the target table (useful for backfills). - You can drop the view and the data survives.
- You have full control over the target's engine, ORDER BY, TTL, replication.
- The target schema is explicit; reading the DDL tells you what you have.
Inline engine (legacy)
CREATE MATERIALIZED VIEW my_mv
ENGINE = SummingMergeTree
ORDER BY (hour, event_type)
AS SELECT ... FROM source;
ClickHouse creates a hidden .inner.<view_name> table to hold the data. The view and its storage are bound together. Dropping the view drops the data.
This shorter form is fine for quick experiments. For anything in production, use the TO form. The docs go further and explicitly disallow combining TO and POPULATE: "When creating a materialized view with TO [db].[table], you can't also use POPULATE."
POPULATE: What It Does and Why to Skip It
CREATE MATERIALIZED VIEW my_mv
ENGINE = SummingMergeTree
ORDER BY (...)
POPULATE
AS SELECT ... FROM source;
POPULATE runs the SELECT once over existing source data at view creation and inserts the result. Useful in theory, problematic in practice. The docs warn: "We do not recommend using POPULATE, since data inserted in the table during the view creation will not be inserted in it." Also: "It is not supported in ClickHouse Cloud" and not supported with the Replicated database engine.
The safer pattern:
- Create the target table.
- Create the materialized view (it starts capturing new inserts immediately).
- Backfill the target separately with
INSERT INTO target SELECT ... FROM source WHERE event_time < <view_creation_time>.
This way you control the backfill semantics, can throttle it, can restart it on failure, and never have a window where the MV silently dropped rows.
Refreshable Materialized Views
The newer REFRESH EVERY syntax produces views that periodically run a full query and replace (or append to) the target table:
CREATE MATERIALIZED VIEW analytics.daily_summary
REFRESH EVERY 1 HOUR
[OFFSET 5 MINUTE]
[RANDOMIZE FOR 30 SECOND]
[APPEND]
TO analytics.daily_summary_table
AS
SELECT
toDate(event_time) AS day,
event_type,
count() AS cnt,
uniq(user_id) AS users
FROM analytics.events
WHERE event_time >= today() - 7
GROUP BY day, event_type;
Refreshable MVs are the right tool when:
- The aggregate is over a large window and an incremental MV would not be exact (e.g. percentile over the last 24 hours).
- The source data is mutable or comes from an external system you cannot trigger off.
- The cost of re-running the query once an hour is acceptable.
Default behavior is to atomically replace the target on each refresh. APPEND mode "inserts rows into the table without deleting existing rows," useful for accumulating snapshots. RANDOMIZE FOR jitters the schedule to avoid every refreshable view firing at once. For dependent refreshable views, use DEPENDS ON so dependents wait for upstream refreshes to finish.
Common Production Patterns
Streaming aggregation
CREATE TABLE analytics.events (event_time DateTime, user_id UInt64, event_type String, ...) ENGINE = MergeTree ORDER BY (user_id, event_time);
CREATE TABLE analytics.events_per_user_day
(
day Date, user_id UInt64,
sessions AggregateFunction(uniq, UInt64),
total_events AggregateFunction(count)
)
ENGINE = AggregatingMergeTree
ORDER BY (day, user_id);
CREATE MATERIALIZED VIEW analytics.events_per_user_day_mv
TO analytics.events_per_user_day
AS
SELECT
toDate(event_time) AS day,
user_id,
uniqState(session_id) AS sessions,
countState() AS total_events
FROM analytics.events
GROUP BY day, user_id;
Read with the matching Merge combinator: uniqMerge(sessions), countMerge(total_events). See the MergeTree guide for the AggregatingMergeTree details.
Kafka ingestion
CREATE TABLE analytics.events_kafka (...) ENGINE = Kafka SETTINGS kafka_broker_list = '...', kafka_topic_list = 'events', kafka_format = 'JSONEachRow';
CREATE TABLE analytics.events (...) ENGINE = MergeTree ORDER BY (...);
CREATE MATERIALIZED VIEW analytics.events_consumer
TO analytics.events
AS SELECT * FROM analytics.events_kafka;
The Kafka engine table reads from Kafka; the MV pumps rows into the MergeTree storage table. Drop the MV to pause ingestion; recreate to resume. This is the canonical Kafka pattern.
Denormalized projection
CREATE MATERIALIZED VIEW analytics.events_enriched
TO analytics.events_enriched_table
AS
SELECT
e.*,
u.country,
u.tier
FROM analytics.events AS e
LEFT JOIN analytics.users AS u USING user_id;
Build a wider table for fast lookups without doing the JOIN at query time. Caveat: the JOIN runs on each INSERT batch, against the state of users at that moment. If users change later, the enriched rows are not retroactively updated.
Chained materialized views
-- Raw events
CREATE TABLE events (...) ENGINE = MergeTree ...;
-- First-level aggregate
CREATE TABLE events_per_hour (...) ENGINE = SummingMergeTree ...;
CREATE MATERIALIZED VIEW events_per_hour_mv TO events_per_hour AS
SELECT toStartOfHour(event_time) AS hour, count() FROM events GROUP BY hour;
-- Second-level aggregate from the first
CREATE TABLE events_per_day (...) ENGINE = SummingMergeTree ...;
CREATE MATERIALIZED VIEW events_per_day_mv TO events_per_day AS
SELECT toDate(hour) AS day, sum(c) FROM events_per_hour GROUP BY day;
Chained MVs let you build a tiered rollup pipeline. The thing to know: each MV sees the raw INSERT to its source, not the merged result. The first MV sees one row per insert; the second MV sees that row arrive into events_per_hour, before any merges have collapsed the SummingMergeTree. Design accordingly.
Common Pitfalls
- MVs do not see UPDATEs or DELETEs. They are INSERT triggers only. Mutations on the source do not propagate. If you need to track mutations, use a ReplacingMergeTree pattern or a refreshable MV instead.
- POPULATE has gaps. Inserts during POPULATE are lost. Always use the backfill-separately pattern.
- JOINs against external tables freeze at INSERT time. The right-side of a JOIN is evaluated when the source rows are inserted. Later updates to the JOINed table do not refresh enriched rows.
- Dropping the source table. If a regular MV's source is dropped, the MV breaks. With the TO form, the target table is independent; with the inline form, the .inner table goes with the view.
- Replicated MVs on Replicated databases. Inside a Replicated database, both the MV and its target must be defined inside the database. Cross-database refs across Replicated boundaries are not supported.
How Pulse Helps With Materialized View Health
Materialized views are the most common source of "silently incorrect" data on ClickHouse clusters. POPULATE gaps, broken JOINs after schema changes, target tables that ran out of disk, and Kafka MVs that crashed and never restarted all produce missing rows without raising loud errors. Pulse continuously tracks MV health across ClickHouse clusters and surfaces views whose target counts have diverged from expected, Kafka-fed MVs that stopped consuming, refreshable views that missed a refresh, and target tables whose sizes are growing in patterns that don't match the source. Connect your ClickHouse cluster to Pulse and let it watch the long tail of MV correctness.
Frequently Asked Questions
Q: Does a materialized view see existing data when I create it?
Not by default. The MV fires on subsequent inserts only. Use POPULATE to backfill existing data (with the caveats above), or do a one-shot INSERT INTO target SELECT FROM source manually after creating the MV.
Q: What's the difference between a materialized view and a regular view?
A regular view (CREATE VIEW) is a saved query, evaluated every time you SELECT from it. A materialized view stores its result in a target table that gets updated on every INSERT to the source. Regular views are essentially free; materialized views trade INSERT cost for SELECT speed.
Q: Can I have multiple MVs on the same source table?
Yes. Each MV runs its SELECT independently for every INSERT. The cost is additive, so be deliberate about how many heavy aggregates you fan out from one source.
Q: Should I use the inline engine or TO syntax?
TO. Always TO, in production. The inline form ties the data to the view definition and complicates everything from drop to backfill to migrations.
Q: How do I update an MV's SELECT after it's been created?
ALTER TABLE <mv_name> MODIFY QUERY ... modifies the SELECT for new inserts. Existing data in the target table is unchanged. For a full backfill with the new logic, you have to recreate or run a manual rewrite.
Q: Are refreshable MVs replacing regular MVs?
No, they are complementary. Regular MVs are right for streaming aggregates over append-only data. Refreshable MVs are right for periodic full recomputations, often where the SELECT is too complex for the incremental pattern or where the source is external.
Q: Can I use FINAL inside an MV's SELECT?
You can, but be careful. FINAL is expensive, and inside an MV it runs on every INSERT batch. For source tables that are ReplacingMergeTree or similar, the right pattern is often to design the MV around the raw rows and let the consumer query handle the FINAL semantics.