Projections in ClickHouse let you store the same data in multiple sort orders or pre-aggregated forms within a single table. When the query optimizer detects that a projection can serve a query more efficiently than the base table, it uses the projection automatically — without any changes to your SQL.
Think of projections as hidden, automatically maintained alternative views of your table data. They're one of ClickHouse's most powerful optimization features, but they come with trade-offs you need to understand.
When Projections Help
ClickHouse tables have a single primary key that determines sort order. Queries that filter or sort by columns not in the primary key must scan significantly more data. Projections solve two specific problems:
1. Queries That Filter on Non-Primary-Key Columns
If your table is sorted by (event_date, user_id) but you frequently query by country, those queries scan the entire table. A projection sorted by country enables efficient range scans for country-based filters.
2. Queries That Aggregate Data Repeatedly
If your dashboard runs SELECT date, SUM(amount) FROM sales GROUP BY date thousands of times per day, each execution re-scans and re-aggregates the raw data. A pre-aggregation projection stores the grouped result and serves it directly.
Creating Projections
In a Table Definition
You can define projections when creating a table:
CREATE TABLE events
(
event_date Date,
user_id UInt64,
country String,
event_type String,
amount Decimal(10, 2),
PROJECTION events_by_country
(
SELECT * ORDER BY country
),
PROJECTION daily_totals
(
SELECT
event_date,
event_type,
sum(amount) AS total_amount,
count() AS event_count
GROUP BY event_date, event_type
)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id);
Projections defined at table creation time automatically apply to all inserted data.
Adding to an Existing Table
You can add projections to tables that already contain data:
-- Add the projection definition
ALTER TABLE events
ADD PROJECTION events_by_country
(
SELECT * ORDER BY country
);
-- Materialize for existing data
ALTER TABLE events
MATERIALIZE PROJECTION events_by_country;
The ADD PROJECTION statement only adds the metadata — new inserts will include the projection, but existing data won't have it. You must run MATERIALIZE PROJECTION to build the projection for existing parts. This is a background operation and can take time on large tables.
Removing Projections
-- Remove projection and delete files
ALTER TABLE events
DROP PROJECTION events_by_country;
-- Delete projection files but keep metadata
ALTER TABLE events
CLEAR PROJECTION events_by_country;
How Projections Work Internally
Projections are stored within each data part. When ClickHouse writes a new part during an insert, it simultaneously builds the projection data for that part. Internally, a projection is essentially a hidden table nested inside the part directory.
During query execution:
- The query optimizer analyzes the query
- For each table part, it evaluates whether a projection can serve the query with less data scanning
- If a projection wins, ClickHouse reads from the projection's data instead of the base table
- Different parts can use different projections (or no projection) — the optimizer decides per-part
This per-part decision is important: if you add a projection to an existing table but don't materialize it, only newly inserted parts will have the projection. Old parts fall back to the base table data.
Projection vs. Materialized View
Both projections and materialized views can pre-aggregate data or provide alternative access paths. Here's how to choose:
| Aspect | Projection | Materialized View |
|---|---|---|
| Storage | Embedded within the source table's parts | Separate target table |
| Query routing | Automatic — optimizer selects transparently | Manual — you query the MV's target table directly |
| Schema flexibility | Must use columns from the source table | Can JOIN, transform, and write to any target schema |
| Maintenance | Automatically kept in sync during inserts and merges | Triggered on insert; can diverge if source data is mutated |
| Backfill | MATERIALIZE PROJECTION for existing data |
Must re-insert or rebuild manually |
| Best for | Alternative sort orders, simple aggregations | Complex transformations, cross-table aggregations, feeding separate tables |
Rule of thumb: Use projections when you need a different sort order or a simple GROUP BY on the same table. Use materialized views when you need transformations, JOINs, or a fully separate destination table.
Practical Examples
Alternative Sort Order for Filtering
Your table is sorted by (timestamp, service_name) but you frequently search by trace_id:
ALTER TABLE traces
ADD PROJECTION traces_by_trace_id
(
SELECT * ORDER BY trace_id
);
ALTER TABLE traces
MATERIALIZE PROJECTION traces_by_trace_id;
Now queries like SELECT * FROM traces WHERE trace_id = 'abc123' use the projection automatically instead of scanning the entire table.
Pre-Aggregation for Dashboard Queries
Your dashboard repeatedly queries hourly metrics:
ALTER TABLE raw_metrics
ADD PROJECTION hourly_stats
(
SELECT
toStartOfHour(timestamp) AS hour,
metric_name,
avg(value) AS avg_value,
max(value) AS max_value,
min(value) AS min_value,
count() AS sample_count
GROUP BY hour, metric_name
);
ALTER TABLE raw_metrics
MATERIALIZE PROJECTION hourly_stats;
Queries that match this GROUP BY pattern will read pre-aggregated data — potentially reducing data scanned by orders of magnitude.
Checking if Projections Are Being Used
Use EXPLAIN to verify the optimizer is selecting your projection:
EXPLAIN
SELECT
toStartOfHour(timestamp) AS hour,
metric_name,
avg(value)
FROM raw_metrics
GROUP BY hour, metric_name;
Look for ReadFromMergeTree (Projection: hourly_stats) in the output. If you see ReadFromMergeTree without a projection name, the optimizer chose the base table instead.
Limitations and Gotchas
MergeTree family only: Projections require MergeTree-family table engines.
Storage overhead: Each projection stores a copy of the data (in its own sort order or aggregated form). A projection that selects all columns with a different ORDER BY roughly doubles storage for the table.
Insert overhead: Every insert must build all defined projections, which increases write latency and CPU usage. Don't create projections you don't actively need.
Merge behavior: Since ClickHouse v24.8, the
deduplicate_merge_projection_modesetting controls what happens to projections during background merges. The default (throw) can cause merge failures if projections become inconsistent. Consider setting this torebuildfor production:ALTER TABLE events MODIFY SETTING deduplicate_merge_projection_mode = 'rebuild';Mutation interaction: Lightweight deletes and mutations can invalidate projection data. Since v24.7,
lightweight_mutation_projection_modecontrols this behavior —rebuildkeeps projections consistent,dropdiscards them.No partial column projections for aggregation: An aggregation projection must include all GROUP BY columns. The optimizer won't use a projection if the query's GROUP BY doesn't match.
Optimizer may not always choose the projection: The optimizer uses heuristics based on estimated data size. In some cases, it may incorrectly prefer the base table. Test with
EXPLAINand considerSETTINGS force_optimize_projection = 1for debugging (not production).
Best Practices
Start with primary key optimization: Projections are a secondary optimization. Make sure your primary key already matches your most common query patterns before adding projections.
Monitor storage impact: Check projection sizes with
SELECT name, formatReadableSize(bytes_on_disk) FROM system.projection_parts WHERE active.Limit the number of projections: Each projection adds write-time overhead. Two to three projections per table is reasonable; more than five should raise questions.
Materialize after adding: Always run
MATERIALIZE PROJECTIONafterADD PROJECTIONon existing tables, otherwise the projection only covers new data.Verify with EXPLAIN: After creating a projection, run your target query with
EXPLAINto confirm the optimizer is using it. Don't assume — test.Use aggregation projections for hot dashboards: If a specific GROUP BY query runs hundreds of times per day, an aggregation projection can reduce load dramatically.
Frequently Asked Questions
Q: Do projections automatically stay in sync with table data?
Yes, for inserts. New data written to the table automatically includes projection data. However, mutations (UPDATE/DELETE) may invalidate projections depending on your lightweight_mutation_projection_mode setting. After mutations, you may need to re-materialize.
Q: Can I use projections with ReplicatedMergeTree?
Yes. Projection operations (ADD, DROP, MATERIALIZE, CLEAR) replicate across nodes via ClickHouse Keeper or ZooKeeper, just like other metadata changes.
Q: How much extra storage do projections use?
An ORDER BY projection on all columns roughly doubles storage for the covered data. An aggregation projection is typically much smaller — often 1–5% of the source data size, depending on the aggregation's reduction factor.
Q: Do projections work with ClickHouse Cloud's SharedMergeTree?
Yes. Projections are supported on SharedMergeTree in ClickHouse Cloud and work the same way as on standard MergeTree.
Q: Can projections replace indexes?
Not exactly. Projections provide alternative sort orders (similar to covering indexes in traditional databases), but they store full data copies rather than lightweight index structures. Data skipping indexes (bloom_filter, minmax, set) are lighter-weight alternatives for simple filter acceleration.
Q: Why isn't the optimizer using my projection?
Common reasons: the query's GROUP BY or ORDER BY doesn't exactly match the projection definition, the projection hasn't been materialized for existing parts, or the optimizer estimates the base table is faster. Use EXPLAIN to diagnose, and test with SETTINGS force_optimize_projection = 1 to force projection usage for comparison.