ClickHouse Time-Series Alignment and Interpolation: Filling Gaps and Regularizing Data

Q: What is the difference between `WITH FILL` and `INTERPOLATE`?

WITH FILL generates the missing rows on a regular grid; INTERPOLATE decides what value those generated rows get. WITH FILL alone leaves filled columns at their default (zero). INTERPOLATE (col AS col) carries the previous value forward instead. They are almost always used together for gauge metrics.

Q: Does `INTERPOLATE` do linear interpolation?

No, despite the name. INTERPOLATE performs forward fill — it repeats the previous row's value (or an expression of it). True linear interpolation between two known points requires window functions ( anyLastIf over forward and reverse windows) to fetch the bracketing points and apply the interpolation formula manually.

Q: How do I stop a forward fill from carrying a value across a long outage?

Use the STALENESS clause (ClickHouse 24.11+): WITH FILL STEP toIntervalHour(1) STALENESS toIntervalHour(3) stops generating filled rows once the gap from the last real row exceeds three hours. It cannot be combined with FROM .

Q: Why are the gaps at the beginning and end of my time range not filled?

By default WITH FILL only fills between the minimum and maximum values actually present in the result. Add explicit FROM and TO bounds to force a complete fixed window (note TO is non-inclusive, so add one step if you need the final bucket).

Q: What `STEP` should I use?

Match it to your bucket size. The default is 1 second for DateTime and 1 day for Date , which rarely matches a bucketed query. Pass an interval explicitly — toIntervalMinute(5) , toIntervalHour(1) , toIntervalMillisecond(100) — equal to the interval you used in toStartOfInterval / toStartOf… .

Real time-series data is rarely evenly spaced. Sensors drop readings, metrics arrive irregularly, and GROUP BY over sparse data produces buckets only for periods that actually had events. Before you can chart, diff, or join two series, you usually need to put them on a regular time grid and decide what to do with the gaps — leave them as zero, carry the last value forward, or interpolate between known points.

This guide covers the three layers of the problem in ClickHouse: generating a complete time grid with WITH FILL, filling values into that grid with INTERPOLATE (and STALENESS), and aligning multiple series onto a shared axis using JOINs and window functions.

The Two Problems: Missing Rows vs. Missing Values

It helps to separate two distinct concerns:

Missing rows — your result has no row at all for 09:03:00 because nothing happened in that minute. This is a grid problem, solved by WITH FILL.
Missing values — the row exists (or WITH FILL created it), but the measured column is empty/zero and you want a sensible value there. This is a fill strategy problem, solved by INTERPOLATE, window functions, or JOINs.

Most real pipelines need both: first build a dense grid, then populate it.

Building a Regular Grid with WITH FILL

WITH FILL is a modifier on ORDER BY that synthesizes the missing rows between the values that are actually present. The full syntax is:

ORDER BY expr WITH FILL
  [FROM const_expr]
  [TO const_expr]
  [STEP const_numeric_expr]
  [STALENESS const_numeric_expr]
[INTERPOLATE [(col [AS expr], ...)]]

A typical hourly example over sparse data:

SELECT
    toStartOfHour(time) AS h,
    sum(hits) AS hits
FROM wikistat
WHERE (project = 'it') AND (date(time) = '2015-06-12')
GROUP BY h
ORDER BY h ASC WITH FILL STEP toIntervalHour(1);

Hours with no events are emitted with the default fill value (0 for the numeric column), so a chart shows a continuous timeline instead of skipping empty hours.

Defaults you should know:

If FROM/TO are omitted, ClickHouse fills between the minimum and maximum values present in the result. Supply FROM/TO explicitly when you want a fixed window (e.g., a full day) regardless of where data starts and ends. TO is non-inclusive.
The default STEP is 1.0 for numeric columns, 1 day for Date, and 1 second for DateTime. For anything coarser or finer, pass an interval such as toIntervalHour(1), toIntervalMinute(5), or toIntervalMillisecond(100).

Use the same bucketing function in both SELECT and ORDER BY so the grid and the data line up. See DateTrunc and date bucketing and the DateTime64 guide for the bucketing functions and sub-second precision details.

Fill Strategies: Zero, Forward, and Interpolated

Once the grid exists, the synthesized rows need values. There are three common strategies.

Strategy	What the gap gets	ClickHouse mechanism	When to use
Zero fill	`0` (or column default)	`WITH FILL` alone	Counters/sums where "no events" genuinely means zero
Forward fill (LOCF)	The previous known value	`INTERPOLATE (col AS col)` or window functions	Gauges/levels where the last reading still holds
Linear interpolation	A value proportional to position between neighbors	Window functions / JOIN	Smoothly varying signals where a straight line between points is reasonable

Picking the wrong one distorts the data: forward-filling a counter overstates activity, and zero-filling a temperature gauge invents a cliff that never happened.

Zero Fill

This is the default. WITH FILL with no INTERPOLATE leaves non-ordering columns at their default (0, empty string, etc.). It is exactly right for additive metrics like request counts or bytes transferred, where an empty bucket means nothing occurred.

Forward Fill (Last Observation Carried Forward)

For gauge metrics — temperature, queue depth, a price, a connection count — an empty bucket does not mean zero; it means "unchanged since the last reading." Combine an aggregate like any() with INTERPOLATE:

SELECT
    toStartOfHour(time) AS h,
    any(value) AS v
FROM metrics
GROUP BY h
ORDER BY h ASC WITH FILL STEP toIntervalHour(1)
INTERPOLATE (v AS v);

INTERPOLATE (v AS v) tells ClickHouse: for every filled-in row, set v to the value of v from the previous row. The result is a flat line at the last observed level until the next real reading appears — classic last-observation-carried-forward (LOCF) behavior.

INTERPOLATE can apply an expression, not just copy: INTERPOLATE (v AS v * 0.99) decays the carried value by 1% each step. Listing no columns — bare INTERPOLATE — carries forward every eligible non-ordering column.

Avoiding Stale Forward Fills with STALENESS

Forward fill is dangerous across long gaps: if a sensor goes offline for a week, you usually do not want its last reading carried forward for the whole week. The STALENESS clause (added in ClickHouse 24.11) bounds how far a value is carried:

SELECT
    toStartOfHour(time) AS h,
    any(value) AS v
FROM metrics
GROUP BY h
ORDER BY h ASC WITH FILL
    STEP toIntervalHour(1)
    STALENESS toIntervalHour(3)
INTERPOLATE (v AS v);

With STALENESS toIntervalHour(3), ClickHouse only generates filled rows until the distance from the previous real row exceeds 3 hours; beyond that, it stops filling rather than carrying a stale value across the entire gap. Note that STALENESS cannot be combined with FROM in the same WITH FILL clause.

Linear Interpolation Between Known Points

INTERPOLATE only does forward fill — it repeats or transforms the previous value. For true linear interpolation, where a gap value sits proportionally on the straight line between its neighbors, you compute it explicitly with window functions. The formula is:

value = prev_value + (t - prev_t) / (next_t - prev_t) * (next_value - prev_value)

In ClickHouse, fetch the previous and next known points with window functions and apply the formula. The Altinity Knowledge Base demonstrates the pattern using anyLastIf() over forward- and reverse-ordered windows to grab the bracketing points:

SELECT
    timestamp,
    prev.2 + (timestamp - prev.1) * (next.2 - prev.2) / (next.1 - prev.1) AS interpolated
FROM
(
    SELECT
        timestamp,
        value,
        anyLastIf((timestamp, value), value IS NOT NULL)
            OVER (ORDER BY timestamp ASC
                  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS prev,
        anyLastIf((timestamp, value), value IS NOT NULL)
            OVER (ORDER BY timestamp DESC
                  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS next
    FROM raw_series
);

The tuples prev and next carry both the timestamp and value of the nearest known point on each side, so the outer query has everything the formula needs. The reverse-ordered window (ORDER BY timestamp DESC) is the trick for looking forward to the next point. For more on window-function patterns and running computations, see the cumulative aggregations guide and array functions vs. window functions.

Aligning Multiple Series onto One Grid

Charting or correlating two metrics requires them on the same timestamps. Two approaches, depending on how irregular the inputs are.

Bucket-and-JOIN for Regularly Bucketed Series

If each series can be bucketed to the same grid, aggregate each independently and FULL JOIN on the bucket. Use WITH FILL on the combined result to ensure no bucket is missing:

SELECT
    h,
    a.v AS cpu,
    b.v AS mem
FROM
(
    SELECT toStartOfMinute(time) AS h, avg(value) AS v
    FROM metrics WHERE name = 'cpu' GROUP BY h
) AS a
FULL JOIN
(
    SELECT toStartOfMinute(time) AS h, avg(value) AS v
    FROM metrics WHERE name = 'mem' GROUP BY h
) AS b USING (h)
ORDER BY h ASC WITH FILL STEP toIntervalMinute(1);

A FULL JOIN keeps minutes present in either series; WITH FILL then closes any minute missing from both. See JOIN types in ClickHouse for choosing between INNER, LEFT, and FULL here.

ASOF JOIN for Irregular Sampling

When two series are sampled at unrelated, irregular timestamps, you do not want to force them into shared buckets — you want, for each point in series A, the most recent point in series B at or before it. That is exactly what ASOF JOIN does:

SELECT
    a.time,
    a.price,
    b.rate
FROM trades AS a
ASOF LEFT JOIN fx_rates AS b
    ON a.symbol = b.symbol AND a.time >= b.time;

The inequality column (a.time >= b.time) must be the last condition in ON. ASOF JOIN is the idiomatic ClickHouse tool for "align this event stream against the last-known state of another" — a forward fill expressed as a join rather than a grid operation.

Best Practices

Match the fill strategy to the metric type. Sums and counters: zero fill. Gauges and levels: forward fill (with STALENESS). Smoothly varying signals you trust to be linear: interpolation. Do not blanket-apply one strategy.
Always set FROM/TO for fixed-window charts. Without them the grid only spans the min/max of present data, so an empty leading or trailing period silently disappears. Explicit bounds make dashboards stable.
Bound forward fills with STALENESS (24.11+) so an offline source does not paint a confident flat line across a multi-day outage.
Bucket once, consistently. Use the same toStartOf… / toStartOfInterval expression everywhere a given series appears so JOIN keys and fill steps align exactly.
Pre-aggregate hot grids with a materialized view. If you repeatedly bucket the same raw data, a materialized view maintaining per-minute or per-hour rollups makes the gap-fill query read far less data.

Common Issues

WITH FILL filled nothing. The ordering column probably is not the bucketed expression, or STEP does not match your bucket size (e.g., STEP toIntervalHour(1) over minute buckets fills only one row per hour). Order by the exact bucket expression and match the step.
Gaps at the start/end are missing. Add FROM and TO; default bounds are the data's own min/max, so leading/trailing empty periods are never created.
Forward fill carries values too far. Add STALENESS. Remember it cannot be combined with FROM.
INTERPOLATE did nothing. It only applies to columns not in the ORDER BY … WITH FILL list, and it copies/transforms the previous row — it does not perform linear interpolation. Use window functions for that.
ASOF JOIN returns unexpected matches. The inequality must be the final ON condition, and the right-hand series must be sorted appropriately on the inequality key. Re-check column order in ON.

How Pulse Helps

Pulse provides production monitoring and expert support for ClickHouse and other open-source data platforms. Gap-filling and alignment bugs are easy to ship and hard to spot — a forward fill with no STALENESS bound, a WITH FILL step that silently skips most buckets, or an ASOF JOIN with a misordered ON clause all produce plausible-looking charts that are quietly wrong. Pulse helps teams validate these query patterns, catch the schema and bucketing choices that make them slow, and keep time-series workloads performant as data grows. Learn more at pulse.support.

Frequently Asked Questions

Q: What is the difference between WITH FILL and INTERPOLATE?

WITH FILL generates the missing rows on a regular grid; INTERPOLATE decides what value those generated rows get. WITH FILL alone leaves filled columns at their default (zero). INTERPOLATE (col AS col) carries the previous value forward instead. They are almost always used together for gauge metrics.

Q: Does INTERPOLATE do linear interpolation?

No, despite the name. INTERPOLATE performs forward fill — it repeats the previous row's value (or an expression of it). True linear interpolation between two known points requires window functions (anyLastIf over forward and reverse windows) to fetch the bracketing points and apply the interpolation formula manually.

Q: How do I stop a forward fill from carrying a value across a long outage?

Use the STALENESS clause (ClickHouse 24.11+): WITH FILL STEP toIntervalHour(1) STALENESS toIntervalHour(3) stops generating filled rows once the gap from the last real row exceeds three hours. It cannot be combined with FROM.

Q: Why are the gaps at the beginning and end of my time range not filled?

By default WITH FILL only fills between the minimum and maximum values actually present in the result. Add explicit FROM and TO bounds to force a complete fixed window (note TO is non-inclusive, so add one step if you need the final bucket).

Q: What is the best way to align two irregularly sampled series?

If both can be bucketed to a common grid, aggregate each and FULL JOIN on the bucket, then apply WITH FILL. If they are sampled at unrelated timestamps and you want each A-point matched to the latest prior B-point, use ASOF JOIN instead — it is purpose-built for that pattern.

Q: What STEP should I use?

Match it to your bucket size. The default is 1 second for DateTime and 1 day for Date, which rarely matches a bucketed query. Pass an interval explicitly — toIntervalMinute(5), toIntervalHour(1), toIntervalMillisecond(100) — equal to the interval you used in toStartOfInterval/toStartOf….