Using Projections with ClickHouse: Pre-Aggregation and Alternative Sort Orders

Projections in ClickHouse let you store the same data in multiple sort orders or pre-aggregated forms within a single table. When the query optimizer detects that a projection can serve a query more efficiently than the base table, it uses the projection automatically — without any changes to your SQL.

Think of projections as hidden, automatically maintained alternative views of your table data. They're one of ClickHouse's most powerful optimization features, but they come with trade-offs you need to understand.

When Projections Help

ClickHouse tables have a single primary key that determines sort order. Queries that filter or sort by columns not in the primary key must scan significantly more data. Projections solve two specific problems:

1. Queries That Filter on Non-Primary-Key Columns

If your table is sorted by (event_date, user_id) but you frequently query by country, those queries scan the entire table. A projection sorted by country enables efficient range scans for country-based filters.

2. Queries That Aggregate Data Repeatedly

If your dashboard runs SELECT date, SUM(amount) FROM sales GROUP BY date thousands of times per day, each execution re-scans and re-aggregates the raw data. A pre-aggregation projection stores the grouped result and serves it directly.

Creating Projections

In a Table Definition

You can define projections when creating a table:

CREATE TABLE events
(
    event_date Date,
    user_id UInt64,
    country String,
    event_type String,
    amount Decimal(10, 2),
    PROJECTION events_by_country
    (
        SELECT * ORDER BY country
    ),
    PROJECTION daily_totals
    (
        SELECT
            event_date,
            event_type,
            sum(amount) AS total_amount,
            count() AS event_count
        GROUP BY event_date, event_type
    )
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id);

Projections defined at table creation time automatically apply to all inserted data.

Adding to an Existing Table

You can add projections to tables that already contain data:

-- Add the projection definition
ALTER TABLE events
    ADD PROJECTION events_by_country
    (
        SELECT * ORDER BY country
    );

-- Materialize for existing data
ALTER TABLE events
    MATERIALIZE PROJECTION events_by_country;

The ADD PROJECTION statement only adds the metadata — new inserts will include the projection, but existing data won't have it. You must run MATERIALIZE PROJECTION to build the projection for existing parts. This is a background operation and can take time on large tables.

Removing Projections

-- Remove projection and delete files
ALTER TABLE events
    DROP PROJECTION events_by_country;

-- Delete projection files but keep metadata
ALTER TABLE events
    CLEAR PROJECTION events_by_country;

How Projections Work Internally

Projections are stored within each data part. When ClickHouse writes a new part during an insert, it simultaneously builds the projection data for that part. Internally, a projection is essentially a hidden table nested inside the part directory.

During query execution:

  1. The query optimizer analyzes the query
  2. For each table part, it evaluates whether a projection can serve the query with less data scanning
  3. If a projection wins, ClickHouse reads from the projection's data instead of the base table
  4. Different parts can use different projections (or no projection) — the optimizer decides per-part

This per-part decision is important: if you add a projection to an existing table but don't materialize it, only newly inserted parts will have the projection. Old parts fall back to the base table data.

Partial Projection Optimization

The projections described above store full data copies — every column they select is duplicated in the projection's own sort order. That delivers the fastest reads but doubles storage for wide "SELECT *" projections. Since ClickHouse 25.5, you can build a much lighter alternative: a partial projection that stores only its sorting key plus the _part_offset virtual column, and reads the remaining columns back from the base table.

_part_offset is the row's physical position within its part. A projection sorted on a filter column, storing just that column and _part_offset, behaves like a secondary index: ClickHouse uses the projection's primary index to locate the matching rows, then follows the _part_offset pointers to read the actual column values from the base table.

CREATE TABLE page_views
(
    id UInt64,
    event_date Date,
    user_id UInt32,
    url String,
    region String,
    PROJECTION region_proj
    (
        SELECT _part_offset ORDER BY region
    ),
    PROJECTION user_id_proj
    (
        SELECT _part_offset ORDER BY user_id
    )
)
ENGINE = MergeTree
ORDER BY (event_date, id);

You can add the same kind of projection to an existing table and materialize it like any other:

ALTER TABLE page_views
    ADD PROJECTION region_proj
    (
        SELECT _part_offset ORDER BY region
    );

ALTER TABLE page_views
    MATERIALIZE PROJECTION region_proj;

Full vs. partial projections: the trade-off

The choice is a storage-versus-read-amplification trade-off:

  • Full projection (SELECT * ORDER BY region) — roughly doubles storage, but a matching query is served entirely from the projection with no base-table lookup. Best when the query reads many columns or returns large result sets, and the projection is heavily reused.
  • Partial projection (SELECT _part_offset ORDER BY region) — adds only a few percent of storage, but each match requires a second read from the base table to fetch the other columns. Best when you mainly need fast filtering on a non-primary-key column and the result set is selective.

You can also mix the two: store a couple of frequently selected columns directly in the projection and pull the rest via _part_offset. As a rule of thumb, reach for a partial projection when you want index-like filtering on an additional column without paying for a full second copy of the table; reach for a full projection when the projection effectively answers the whole query on its own.

Multiple filters and granule-level pruning

Two newer improvements make partial projections more broadly useful:

  • Since ClickHouse 25.6, a single query with several filters can use multiple projections at once. ClickHouse still reads row data from only one source (a projection or the base table), but it can use the other projections' primary indexes to prune parts before reading — helpful when each filter column matches a different projection.
  • Since ClickHouse 25.11, _part_offset-based projections support granule-level pruning, not just whole-part pruning. The projection can narrow reads down to individual granules of the base table, which makes selective multi-column filters substantially faster.

As always, confirm the behavior with EXPLAIN (look for the projection name under ReadFromMergeTree) rather than assuming the optimizer picked the partial projection. The same heuristics and verification techniques from the How Projections Work Internally section apply. For background on how parts, granules, and the primary index interact, see the ClickHouse MergeTree guide. If you need transformations or a separate destination table rather than index-like filtering, a materialized view is the better tool, and the CREATE TABLE reference covers defining projections inline at table creation time.

Projection vs. Materialized View

Both projections and materialized views can pre-aggregate data or provide alternative access paths. Here's how to choose:

Aspect Projection Materialized View
Storage Embedded within the source table's parts Separate target table
Query routing Automatic — optimizer selects transparently Manual — you query the MV's target table directly
Schema flexibility Must use columns from the source table Can JOIN, transform, and write to any target schema
Maintenance Automatically kept in sync during inserts and merges Triggered on insert; can diverge if source data is mutated
Backfill MATERIALIZE PROJECTION for existing data Must re-insert or rebuild manually
Best for Alternative sort orders, simple aggregations Complex transformations, cross-table aggregations, feeding separate tables

Rule of thumb: Use projections when you need a different sort order or a simple GROUP BY on the same table. Use materialized views when you need transformations, JOINs, or a fully separate destination table.

Practical Examples

Alternative Sort Order for Filtering

Your table is sorted by (timestamp, service_name) but you frequently search by trace_id:

ALTER TABLE traces
    ADD PROJECTION traces_by_trace_id
    (
        SELECT * ORDER BY trace_id
    );

ALTER TABLE traces
    MATERIALIZE PROJECTION traces_by_trace_id;

Now queries like SELECT * FROM traces WHERE trace_id = 'abc123' use the projection automatically instead of scanning the entire table.

Pre-Aggregation for Dashboard Queries

Your dashboard repeatedly queries hourly metrics:

ALTER TABLE raw_metrics
    ADD PROJECTION hourly_stats
    (
        SELECT
            toStartOfHour(timestamp) AS hour,
            metric_name,
            avg(value) AS avg_value,
            max(value) AS max_value,
            min(value) AS min_value,
            count() AS sample_count
        GROUP BY hour, metric_name
    );

ALTER TABLE raw_metrics
    MATERIALIZE PROJECTION hourly_stats;

Queries that match this GROUP BY pattern will read pre-aggregated data — potentially reducing data scanned by orders of magnitude.

Checking if Projections Are Being Used

Use EXPLAIN to verify the optimizer is selecting your projection:

EXPLAIN
SELECT
    toStartOfHour(timestamp) AS hour,
    metric_name,
    avg(value)
FROM raw_metrics
GROUP BY hour, metric_name;

Look for ReadFromMergeTree (hourly_stats) in the output. If you see ReadFromMergeTree without a projection name, the optimizer chose the base table instead.

Limitations and Gotchas

  1. MergeTree family only: Projections require MergeTree-family table engines.

  2. Storage overhead: Each projection stores a copy of the data (in its own sort order or aggregated form). A projection that selects all columns with a different ORDER BY roughly doubles storage for the table.

  3. Insert overhead: Every insert must build all defined projections, which increases write latency and CPU usage. Don't create projections you don't actively need.

  4. Merge behavior: Since ClickHouse v24.8, the deduplicate_merge_projection_mode setting controls what happens to projections during merges on non-classic MergeTree engines (ReplacingMergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree) and OPTIMIZE DEDUPLICATE operations. These engines can reduce row counts during merges, leaving stale data in projections. The default (throw) causes merge failures when projections would become inconsistent. Plain MergeTree is unaffected. Consider setting this to rebuild for production tables using any of these engines:

    ALTER TABLE events
        MODIFY SETTING deduplicate_merge_projection_mode = 'rebuild';
    
  5. Mutation interaction: Lightweight deletes and mutations can invalidate projection data. Since v24.7, lightweight_mutation_projection_mode controls this behavior — rebuild keeps projections consistent, drop discards them.

  6. No partial column projections for aggregation: An aggregation projection must include all GROUP BY columns. The optimizer won't use a projection if the query's GROUP BY doesn't match.

  7. Optimizer may not always choose the projection: The optimizer uses heuristics based on estimated data size. In some cases, it may incorrectly prefer the base table. Test with EXPLAIN and consider SETTINGS force_optimize_projection = 1 for debugging (not production).

Best Practices

  1. Start with primary key optimization: Projections are a secondary optimization. Make sure your primary key already matches your most common query patterns before adding projections.

  2. Monitor storage impact: Check projection sizes with SELECT name, formatReadableSize(bytes_on_disk) FROM system.projection_parts WHERE active.

  3. Limit the number of projections: Each projection adds write-time overhead. Two to three projections per table is reasonable; more than five should raise questions.

  4. Materialize after adding: Always run MATERIALIZE PROJECTION after ADD PROJECTION on existing tables, otherwise the projection only covers new data.

  5. Verify with EXPLAIN: After creating a projection, run your target query with EXPLAIN to confirm the optimizer is using it. Don't assume — test.

  6. Use aggregation projections for hot dashboards: If a specific GROUP BY query runs hundreds of times per day, an aggregation projection can reduce load dramatically.

Frequently Asked Questions

Q: Do projections automatically stay in sync with table data?

Yes, for inserts. New data written to the table automatically includes projection data. However, mutations (UPDATE/DELETE) may invalidate projections depending on your lightweight_mutation_projection_mode setting. After mutations, you may need to re-materialize.

Q: Can I use projections with ReplicatedMergeTree?

Yes. Projection operations (ADD, DROP, MATERIALIZE, CLEAR) replicate across nodes via ClickHouse Keeper or ZooKeeper, just like other metadata changes.

Q: How much extra storage do projections use?

An ORDER BY projection on all columns roughly doubles storage for the covered data. An aggregation projection is typically much smaller — often 1–5% of the source data size, depending on the aggregation's reduction factor.

Q: Do projections work with ClickHouse Cloud's SharedMergeTree?

Yes. Projections are supported on SharedMergeTree in ClickHouse Cloud and work the same way as on standard MergeTree.

Q: How do I get index-like filtering without doubling storage?

Use a partial projection that stores only _part_offset plus your filter column, e.g. SELECT _part_offset ORDER BY region (ClickHouse 25.5+). It works like a secondary index — the projection's primary index locates matching rows and ClickHouse reads the remaining columns from the base table — so storage overhead is a few percent instead of a full data copy. The trade-off is an extra base-table read per match; prefer a full projection when the projection can answer the entire query on its own. See Partial Projection Optimization above.

Q: Can projections replace indexes?

Not exactly. Projections provide alternative sort orders (similar to covering indexes in traditional databases), but they store full data copies rather than lightweight index structures. Data skipping indexes (bloom_filter, minmax, set) are lighter-weight alternatives for simple filter acceleration.

Q: Why isn't the optimizer using my projection?

Common reasons: the query's GROUP BY or ORDER BY doesn't exactly match the projection definition, the projection hasn't been materialized for existing parts, or the optimizer estimates the base table is faster. Use EXPLAIN to diagnose, and test with SETTINGS force_optimize_projection = 1 to force projection usage for comparison.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.