Projections in ClickHouse let you store the same data in multiple sort orders or pre-aggregated forms within a single table. When the query optimizer detects that a projection can serve a query more efficiently than the base table, it uses the projection automatically — without any changes to your SQL.
Think of projections as hidden, automatically maintained alternative views of your table data. They're one of ClickHouse's most powerful optimization features, but they come with trade-offs you need to understand.
When Projections Help
ClickHouse tables have a single primary key that determines sort order. Queries that filter or sort by columns not in the primary key must scan significantly more data. Projections solve two specific problems:
1. Queries That Filter on Non-Primary-Key Columns
If your table is sorted by (event_date, user_id) but you frequently query by country, those queries scan the entire table. A projection sorted by country enables efficient range scans for country-based filters.
2. Queries That Aggregate Data Repeatedly
If your dashboard runs SELECT date, SUM(amount) FROM sales GROUP BY date thousands of times per day, each execution re-scans and re-aggregates the raw data. A pre-aggregation projection stores the grouped result and serves it directly.
Creating Projections
In a Table Definition
You can define projections when creating a table:
CREATE TABLE events
(
event_date Date,
user_id UInt64,
country String,
event_type String,
amount Decimal(10, 2),
PROJECTION events_by_country
(
SELECT * ORDER BY country
),
PROJECTION daily_totals
(
SELECT
event_date,
event_type,
sum(amount) AS total_amount,
count() AS event_count
GROUP BY event_date, event_type
)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id);
Projections defined at table creation time automatically apply to all inserted data.
Adding to an Existing Table
You can add projections to tables that already contain data:
-- Add the projection definition
ALTER TABLE events
ADD PROJECTION events_by_country
(
SELECT * ORDER BY country
);
-- Materialize for existing data
ALTER TABLE events
MATERIALIZE PROJECTION events_by_country;
The ADD PROJECTION statement only adds the metadata — new inserts will include the projection, but existing data won't have it. You must run MATERIALIZE PROJECTION to build the projection for existing parts. This is a background operation and can take time on large tables.
Removing Projections
-- Remove projection and delete files
ALTER TABLE events
DROP PROJECTION events_by_country;
-- Delete projection files but keep metadata
ALTER TABLE events
CLEAR PROJECTION events_by_country;
How Projections Work Internally
Projections are stored within each data part. When ClickHouse writes a new part during an insert, it simultaneously builds the projection data for that part. Internally, a projection is essentially a hidden table nested inside the part directory.
During query execution:
- The query optimizer analyzes the query
- For each table part, it evaluates whether a projection can serve the query with less data scanning
- If a projection wins, ClickHouse reads from the projection's data instead of the base table
- Different parts can use different projections (or no projection) — the optimizer decides per-part
This per-part decision is important: if you add a projection to an existing table but don't materialize it, only newly inserted parts will have the projection. Old parts fall back to the base table data.
Partial Projection Optimization
The projections described above store full data copies — every column they select is duplicated in the projection's own sort order. That delivers the fastest reads but doubles storage for wide "SELECT *" projections. Since ClickHouse 25.5, you can build a much lighter alternative: a partial projection that stores only its sorting key plus the _part_offset virtual column, and reads the remaining columns back from the base table.
_part_offset is the row's physical position within its part. A projection sorted on a filter column, storing just that column and _part_offset, behaves like a secondary index: ClickHouse uses the projection's primary index to locate the matching rows, then follows the _part_offset pointers to read the actual column values from the base table.
CREATE TABLE page_views
(
id UInt64,
event_date Date,
user_id UInt32,
url String,
region String,
PROJECTION region_proj
(
SELECT _part_offset ORDER BY region
),
PROJECTION user_id_proj
(
SELECT _part_offset ORDER BY user_id
)
)
ENGINE = MergeTree
ORDER BY (event_date, id);
You can add the same kind of projection to an existing table and materialize it like any other:
ALTER TABLE page_views
ADD PROJECTION region_proj
(
SELECT _part_offset ORDER BY region
);
ALTER TABLE page_views
MATERIALIZE PROJECTION region_proj;
Full vs. partial projections: the trade-off
The choice is a storage-versus-read-amplification trade-off:
- Full projection (
SELECT * ORDER BY region) — roughly doubles storage, but a matching query is served entirely from the projection with no base-table lookup. Best when the query reads many columns or returns large result sets, and the projection is heavily reused. - Partial projection (
SELECT _part_offset ORDER BY region) — adds only a few percent of storage, but each match requires a second read from the base table to fetch the other columns. Best when you mainly need fast filtering on a non-primary-key column and the result set is selective.
You can also mix the two: store a couple of frequently selected columns directly in the projection and pull the rest via _part_offset. As a rule of thumb, reach for a partial projection when you want index-like filtering on an additional column without paying for a full second copy of the table; reach for a full projection when the projection effectively answers the whole query on its own.
Multiple filters and granule-level pruning
Two newer improvements make partial projections more broadly useful:
- Since ClickHouse 25.6, a single query with several filters can use multiple projections at once. ClickHouse still reads row data from only one source (a projection or the base table), but it can use the other projections' primary indexes to prune parts before reading — helpful when each filter column matches a different projection.
- Since ClickHouse 25.11,
_part_offset-based projections support granule-level pruning, not just whole-part pruning. The projection can narrow reads down to individual granules of the base table, which makes selective multi-column filters substantially faster.
As always, confirm the behavior with EXPLAIN (look for the projection name under ReadFromMergeTree) rather than assuming the optimizer picked the partial projection. The same heuristics and verification techniques from the How Projections Work Internally section apply. For background on how parts, granules, and the primary index interact, see the ClickHouse MergeTree guide. If you need transformations or a separate destination table rather than index-like filtering, a materialized view is the better tool, and the CREATE TABLE reference covers defining projections inline at table creation time.
Projection vs. Materialized View
Both projections and materialized views can pre-aggregate data or provide alternative access paths. Here's how to choose:
| Aspect | Projection | Materialized View |
|---|---|---|
| Storage | Embedded within the source table's parts | Separate target table |
| Query routing | Automatic — optimizer selects transparently | Manual — you query the MV's target table directly |
| Schema flexibility | Must use columns from the source table | Can JOIN, transform, and write to any target schema |
| Maintenance | Automatically kept in sync during inserts and merges | Triggered on insert; can diverge if source data is mutated |
| Backfill | MATERIALIZE PROJECTION for existing data |
Must re-insert or rebuild manually |
| Best for | Alternative sort orders, simple aggregations | Complex transformations, cross-table aggregations, feeding separate tables |
Rule of thumb: Use projections when you need a different sort order or a simple GROUP BY on the same table. Use materialized views when you need transformations, JOINs, or a fully separate destination table.
Practical Examples
Alternative Sort Order for Filtering
Your table is sorted by (timestamp, service_name) but you frequently search by trace_id:
ALTER TABLE traces
ADD PROJECTION traces_by_trace_id
(
SELECT * ORDER BY trace_id
);
ALTER TABLE traces
MATERIALIZE PROJECTION traces_by_trace_id;
Now queries like SELECT * FROM traces WHERE trace_id = 'abc123' use the projection automatically instead of scanning the entire table.
Pre-Aggregation for Dashboard Queries
Your dashboard repeatedly queries hourly metrics:
ALTER TABLE raw_metrics
ADD PROJECTION hourly_stats
(
SELECT
toStartOfHour(timestamp) AS hour,
metric_name,
avg(value) AS avg_value,
max(value) AS max_value,
min(value) AS min_value,
count() AS sample_count
GROUP BY hour, metric_name
);
ALTER TABLE raw_metrics
MATERIALIZE PROJECTION hourly_stats;
Queries that match this GROUP BY pattern will read pre-aggregated data — potentially reducing data scanned by orders of magnitude.
Checking if Projections Are Being Used
Use EXPLAIN to verify the optimizer is selecting your projection:
EXPLAIN
SELECT
toStartOfHour(timestamp) AS hour,
metric_name,
avg(value)
FROM raw_metrics
GROUP BY hour, metric_name;
Look for ReadFromMergeTree (hourly_stats) in the output. If you see ReadFromMergeTree without a projection name, the optimizer chose the base table instead.
Limitations and Gotchas
MergeTree family only: Projections require MergeTree-family table engines.
Storage overhead: Each projection stores a copy of the data (in its own sort order or aggregated form). A projection that selects all columns with a different ORDER BY roughly doubles storage for the table.
Insert overhead: Every insert must build all defined projections, which increases write latency and CPU usage. Don't create projections you don't actively need.
Merge behavior: Since ClickHouse v24.8, the
deduplicate_merge_projection_modesetting controls what happens to projections during merges on non-classic MergeTree engines (ReplacingMergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree) andOPTIMIZE DEDUPLICATEoperations. These engines can reduce row counts during merges, leaving stale data in projections. The default (throw) causes merge failures when projections would become inconsistent. Plain MergeTree is unaffected. Consider setting this torebuildfor production tables using any of these engines:ALTER TABLE events MODIFY SETTING deduplicate_merge_projection_mode = 'rebuild';Mutation interaction: Lightweight deletes and mutations can invalidate projection data. Since v24.7,
lightweight_mutation_projection_modecontrols this behavior —rebuildkeeps projections consistent,dropdiscards them.No partial column projections for aggregation: An aggregation projection must include all GROUP BY columns. The optimizer won't use a projection if the query's GROUP BY doesn't match.
Optimizer may not always choose the projection: The optimizer uses heuristics based on estimated data size. In some cases, it may incorrectly prefer the base table. Test with
EXPLAINand considerSETTINGS force_optimize_projection = 1for debugging (not production).
Best Practices
Start with primary key optimization: Projections are a secondary optimization. Make sure your primary key already matches your most common query patterns before adding projections.
Monitor storage impact: Check projection sizes with
SELECT name, formatReadableSize(bytes_on_disk) FROM system.projection_parts WHERE active.Limit the number of projections: Each projection adds write-time overhead. Two to three projections per table is reasonable; more than five should raise questions.
Materialize after adding: Always run
MATERIALIZE PROJECTIONafterADD PROJECTIONon existing tables, otherwise the projection only covers new data.Verify with EXPLAIN: After creating a projection, run your target query with
EXPLAINto confirm the optimizer is using it. Don't assume — test.Use aggregation projections for hot dashboards: If a specific GROUP BY query runs hundreds of times per day, an aggregation projection can reduce load dramatically.
Frequently Asked Questions
Q: Do projections automatically stay in sync with table data?
Yes, for inserts. New data written to the table automatically includes projection data. However, mutations (UPDATE/DELETE) may invalidate projections depending on your lightweight_mutation_projection_mode setting. After mutations, you may need to re-materialize.
Q: Can I use projections with ReplicatedMergeTree?
Yes. Projection operations (ADD, DROP, MATERIALIZE, CLEAR) replicate across nodes via ClickHouse Keeper or ZooKeeper, just like other metadata changes.
Q: How much extra storage do projections use?
An ORDER BY projection on all columns roughly doubles storage for the covered data. An aggregation projection is typically much smaller — often 1–5% of the source data size, depending on the aggregation's reduction factor.
Q: Do projections work with ClickHouse Cloud's SharedMergeTree?
Yes. Projections are supported on SharedMergeTree in ClickHouse Cloud and work the same way as on standard MergeTree.
Q: How do I get index-like filtering without doubling storage?
Use a partial projection that stores only _part_offset plus your filter column, e.g. SELECT _part_offset ORDER BY region (ClickHouse 25.5+). It works like a secondary index — the projection's primary index locates matching rows and ClickHouse reads the remaining columns from the base table — so storage overhead is a few percent instead of a full data copy. The trade-off is an extra base-table read per match; prefer a full projection when the projection can answer the entire query on its own. See Partial Projection Optimization above.
Q: Can projections replace indexes?
Not exactly. Projections provide alternative sort orders (similar to covering indexes in traditional databases), but they store full data copies rather than lightweight index structures. Data skipping indexes (bloom_filter, minmax, set) are lighter-weight alternatives for simple filter acceleration.
Q: Why isn't the optimizer using my projection?
Common reasons: the query's GROUP BY or ORDER BY doesn't exactly match the projection definition, the projection hasn't been materialized for existing parts, or the optimizer estimates the base table is faster. Use EXPLAIN to diagnose, and test with SETTINGS force_optimize_projection = 1 to force projection usage for comparison.