ClickHouse Window Functions: OVER, PARTITION BY, Ranking, Analytics

Q: What is the default window frame in ClickHouse?

If ORDER BY is present in the OVER clause but no frame is specified, the default is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW . Without ORDER BY , the frame is the entire partition. If you switch to an explicit RANGE frame, tied rows are grouped as peers and share the same running total — be explicit with a ROWS frame when you need strict per-row cumulative values.

Q: What's the difference between rank, dense_rank, and row_number?

row_number() assigns a unique sequential number to every row. rank() gives tied rows the same rank but then skips numbers ( 1, 2, 2, 4 ). dense_rank() gives tied rows the same rank with no gaps ( 1, 2, 2, 3 ).

Q: Why can't I use a window function in a WHERE clause?

Window functions are evaluated after the WHERE step, so they aren't available there. Compute the window value in a subquery or CTE, then filter on it in the outer query — this is the standard pattern for "top-N per group".

Q: When should I use ROWS versus RANGE in the frame?

Use ROWS for a fixed count of physical rows (e.g. a 7-row moving average) and for predictable offset behavior. Use RANGE when you want all rows within a numeric or date/time distance of the current value; remember RANGE offsets require numeric/date order columns and don't accept INTERVAL .

Window functions let you compute values across a set of rows related to the current row — running totals, moving averages, rankings, row-to-row differences — without collapsing those rows the way GROUP BY does. Every input row is preserved in the output, and each one gets a computed value based on a defined "window" of neighboring rows.

ClickHouse has supported window functions since the 21.x series (they were introduced as experimental and became production-ready and enabled by default in later 21.x releases). This guide covers the OVER / PARTITION BY / ORDER BY / frame syntax, the ranking functions, and when window functions are the right tool versus GROUP BY or array functions.

The OVER Clause Syntax

A window function is any function followed by an OVER clause. The OVER clause defines the window: how rows are grouped, ordered, and which neighboring rows are visible to the function.

function(args) OVER (
    [PARTITION BY expr_list]
    [ORDER BY expr_list]
    [frame_clause]
)

PARTITION BY splits the rows into independent groups; the function restarts for each partition. Omit it and the whole result set is one partition.
ORDER BY defines the order of rows within each partition. It is required for ranking functions and for any frame that depends on row order.
frame_clause (ROWS / RANGE BETWEEN ... AND ...) restricts which rows inside the partition the function actually sees.

A minimal example — rank products by sales within each category:

SELECT
    category,
    product,
    sales,
    rank() OVER (PARTITION BY category ORDER BY sales DESC) AS sales_rank
FROM sales

Named Windows with the WINDOW Clause

When several columns share the same window definition, define it once with a WINDOW clause and reference it by name. This keeps queries readable and avoids copy-paste drift:

SELECT
    category,
    product,
    sales,
    rank()       OVER w AS sales_rank,
    sum(sales)   OVER w AS running_total
FROM sales
WINDOW w AS (PARTITION BY category ORDER BY sales DESC)

Window Functions vs. GROUP BY

This is the decision that trips people up most. Both can aggregate, but they produce fundamentally different shapes of output.

Aspect	`GROUP BY`	Window function (`OVER`)
Output rows	One row per group	One row per input row (nothing collapses)
Access to detail	Detail is lost after aggregation	Every original row is preserved alongside the aggregate
Per-row ranking / offsets	Not possible directly	`rank`, `row_number`, `lagInFrame`, etc.
Running / moving aggregates	Not directly	Natural fit via frames
Typical use	Summaries, dashboards, reports	Running totals, rankings, row-to-row comparisons

Rule of thumb: if you want one summary row per group, use GROUP BY. If you want to keep every row and annotate it with something computed from its neighbors (a rank, a running sum, the previous row's value), use a window function.

-- GROUP BY: one row per category, detail gone
SELECT category, sum(sales) AS total
FROM sales
GROUP BY category;

-- Window: every product row kept, with its category total attached
SELECT category, product, sales,
       sum(sales) OVER (PARTITION BY category) AS category_total
FROM sales;

Ranking Functions: rank vs. dense_rank vs. row_number

The three ranking functions differ only in how they handle ties. All require an ORDER BY inside OVER.

Function	Behavior on ties	Sequence example for values `100, 90, 90, 80`
`row_number()`	Always unique; ties broken arbitrarily	`1, 2, 3, 4`
`rank()`	Same rank for ties, then skips numbers	`1, 2, 2, 4`
`dense_rank()`	Same rank for ties, no gaps	`1, 2, 2, 3`

SELECT
    product,
    sales,
    row_number() OVER (ORDER BY sales DESC) AS rn,
    rank()       OVER (ORDER BY sales DESC) AS rnk,
    dense_rank() OVER (ORDER BY sales DESC) AS dense
FROM sales

Pick by intent:

row_number() — you need a unique sequential number per row (pagination, deduplication, "pick one row per group").
rank() — competition-style ranking where ties should occupy the same place and leave a gap (two golds, no silver).
dense_rank() — ranking categories or tiers where you don't want gaps in the numbering.

ClickHouse also exposes dense_rank()'s native alias denseRank(), plus percent_rank() (relative standing as a fraction) and cume_dist() (cumulative distribution). For deep dives on individual functions, see ClickHouse rank function and ClickHouse denseRank function.

A common idiom is "top-N per group" — number rows per partition, then filter in an outer query (you can't filter on a window result in WHERE directly):

SELECT category, product, sales
FROM
(
    SELECT category, product, sales,
           row_number() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
    FROM sales
)
WHERE rn <= 3

Frame Specification: ROWS vs. RANGE

The frame controls which rows within the partition the function aggregates over. This matters for running totals, moving averages, and "first/last value in window" calculations — not for ranking functions.

Frame boundaries:

UNBOUNDED PRECEDING — start of the partition
N PRECEDING — N rows (ROWS) or N units of the order value (RANGE) before the current row
CURRENT ROW
M FOLLOWING — M rows/units after the current row
UNBOUNDED FOLLOWING — end of the partition

The Default Frame Is a Gotcha

When you specify ORDER BY inside OVER but omit the frame, ClickHouse applies the SQL-standard default: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Without ORDER BY, the frame is the entire partition.

The subtle part is ROWS vs. RANGE for ties. Even with the default ROWS frame, if you switch to an explicit RANGE frame, all rows with the same ORDER BY value are treated as peers — they are all included up to and including the current row's peer group. So with a RANGE frame, tied rows get the same running total (the sum through the end of their tie group), which is often not what people expect:

-- Explicit RANGE frame: rows tied on `day` share one running total.
-- (Default ROWS frame would give each row its own cumulative value.)
SELECT day, amount,
       sum(amount) OVER (ORDER BY day) AS running_total
FROM events;

If you want a strict row-by-row running total where every row gets its own cumulative value, use an explicit ROWS frame:

SELECT day, amount,
       sum(amount) OVER (
           ORDER BY day
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS running_total
FROM events;

Moving Averages and Sliding Windows

A fixed-width sliding window is a classic ROWS frame — e.g. a 7-row trailing moving average:

SELECT
    day,
    value,
    avg(value) OVER (
        ORDER BY day
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7
FROM metrics

RANGE frames are useful when "within N units of the order value" matters more than "N rows" — for example, all rows within a numeric or date distance. Note that RANGE boundaries require numeric or date/time order columns; a RANGE offset over a String ORDER BY errors out, and ClickHouse's RANGE OFFSET does not accept INTERVAL syntax — use numeric seconds for time-based ranges, or switch to ROWS.

Offset and Value Functions

Beyond ranking and aggregates, window functions can reach into other rows of the frame:

Function	Returns
`lagInFrame(x[, offset[, default]])`	Value from a row `offset` positions before the current row
`leadInFrame(x[, offset[, default]])`	Value from a row `offset` positions after the current row
`first_value(x)`	First value in the frame
`last_value(x)`	Last value in the frame
`nth_value(x, n)`	Value from the nth row of the frame

lagInFrame / leadInFrame are the workhorses for row-to-row comparisons — period-over-period deltas, gap detection, sessionization:

-- Day-over-day change per user
SELECT
    user_id,
    day,
    value,
    value - lagInFrame(value, 1, 0) OVER (
        PARTITION BY user_id ORDER BY day
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS day_over_day
FROM daily_metrics

An important caveat: lagInFrame and leadInFrame respect the frame, not just the partition. With the default ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW frame, leadInFrame will not see rows after the current row because they fall outside the frame. For predictable offset behavior, pair them with an explicit ROWS frame as shown above. See ClickHouse lagInFrame function and ClickHouse leadInFrame function for details.

For first/last value over an ordered partition, remember the default frame ends at CURRENT ROW, so a naive last_value() returns the current row, not the partition's last row. Use an explicit frame to get the true last value:

SELECT
    category, product, sales,
    last_value(product) OVER (
        PARTITION BY category ORDER BY sales DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS top_product
FROM sales

Aggregate Functions as Window Functions

Any ClickHouse aggregate function — sum, avg, count, min, max, quantile, uniqExact, and so on — works as a window function simply by adding OVER. This is what makes running totals, cumulative counts, and partition-wide aggregates-without-collapse possible:

SELECT
    category, product, sales,
    sum(sales)   OVER (PARTITION BY category) AS category_total,
    avg(sales)   OVER (PARTITION BY category) AS category_avg,
    sales / sum(sales) OVER (PARTITION BY category) AS share_of_category
FROM sales

Common Issues

Filtering on a window result in WHERE. Window functions are evaluated after WHERE, so you can't reference them there. Wrap the query in a subquery/CTE and filter in the outer query (see the top-N example above).
Unexpected running totals on tied rows. If you use an explicit RANGE frame, tied rows are treated as peers and share the same running total. Use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (the default) for a strict per-row cumulative sum, or be explicit to make intent clear.
last_value / nth_value returning the current row. The default frame stops at CURRENT ROW. Add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING to see the whole partition.
RANGE over non-numeric columns. RANGE offset frames need numeric or date/time order columns; use ROWS otherwise. INTERVAL syntax isn't supported in RANGE OFFSET — use numeric seconds.
Heavy window queries on distributed tables. ClickHouse currently performs window-function calculation on the initiator node, so each shard ships its rows to the coordinator. On large distributed datasets this can become a memory and network bottleneck; pre-aggregate or narrow the data first where possible.

Best Practices

Reach for GROUP BY or array functions first when they fit. If you only need one row per group, GROUP BY is simpler and often cheaper. For per-group transformations on already-grouped arrays, ClickHouse's array functions can be faster — see array functions vs. window functions.
Always be explicit about the frame when doing running totals, moving averages, or first/last value. Even though the default is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, spelling it out makes intent clear and avoids confusion, especially if you need a RANGE frame.
Use the WINDOW clause to define a window once and reuse it across multiple columns — clearer and less error-prone.
Choose the ranking function deliberately. row_number for uniqueness, rank for competition ranking with gaps, dense_rank for gapless tiers.
Narrow data before windowing. Apply WHERE filters and (where possible) pre-aggregate before the window step, especially on distributed tables, to keep memory usage in check.

How Pulse Helps

Window-function-heavy analytical queries are some of the most resource-intensive workloads a ClickHouse cluster runs — they often materialize large intermediate states and, on distributed tables, concentrate work on the initiator node. Pulse is a managed monitoring and expert-support service for ClickHouse (and the broader OLAP/search stack) that surfaces slow and memory-heavy queries, flags spilling and initiator-node pressure, and helps you decide when a query is better expressed as a GROUP BY, an array function, or a pre-aggregating materialized view. When a window query starts dominating cluster resources, Pulse gives you the query-level visibility and the engineering guidance to fix it rather than just scale around it.

Frequently Asked Questions

Q: When should I use a window function instead of GROUP BY?

Use a window function when you need to keep every input row and annotate it with something computed from its neighbors — a rank, a running total, the previous row's value, or a partition-wide aggregate shown next to the detail. Use GROUP BY when you only need one summary row per group.

Q: What is the default window frame in ClickHouse?

If ORDER BY is present in the OVER clause but no frame is specified, the default is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Without ORDER BY, the frame is the entire partition. If you switch to an explicit RANGE frame, tied rows are grouped as peers and share the same running total — be explicit with a ROWS frame when you need strict per-row cumulative values.

Q: What's the difference between rank, dense_rank, and row_number?

row_number() assigns a unique sequential number to every row. rank() gives tied rows the same rank but then skips numbers (1, 2, 2, 4). dense_rank() gives tied rows the same rank with no gaps (1, 2, 2, 3).

Q: Why can't I use a window function in a WHERE clause?

Window functions are evaluated after the WHERE step, so they aren't available there. Compute the window value in a subquery or CTE, then filter on it in the outer query — this is the standard pattern for "top-N per group".

Q: When should I use ROWS versus RANGE in the frame?

Use ROWS for a fixed count of physical rows (e.g. a 7-row moving average) and for predictable offset behavior. Use RANGE when you want all rows within a numeric or date/time distance of the current value; remember RANGE offsets require numeric/date order columns and don't accept INTERVAL.

Q: Are window functions efficient on distributed (sharded) ClickHouse tables?

They work, but ClickHouse currently runs window-function calculation on the initiator node, so shards send their data to the coordinator. On large distributed datasets this can create memory and network pressure — filter and pre-aggregate before the window step where you can.