ClickHouse ASOF JOIN

ASOF JOIN is a ClickHouse join type that matches each row from the left table to the single nearest row in the right table based on an inequality on an ordered column, rather than requiring exact equality. It is the canonical solution to the "last known value" problem in time-series analysis: given a trade at 10:05:03, find the most recent quote at or before that timestamp for the same symbol. A regular INNER JOIN would drop the row entirely if no exact timestamp match exists. ASOF JOIN finds the closest qualifying row instead.

The feature was introduced in 2019, drawing inspiration from the aj (as-of join) operator in kdb+/q, a staple in quantitative finance. It has since expanded to support all four inequality operators and multiple join algorithms.

Syntax

ASOF JOIN requires at least one equality condition and exactly one inequality (closest-match) condition. Purely inequality-based joins with no equality key are not supported.

ON clause form

SELECT
    t.symbol,
    t.volume,
    t.time    AS trade_time,
    q.time    AS quote_time,
    q.price   AS quote_price,
    t.volume * q.price AS value
FROM trades AS t
ASOF LEFT JOIN quotes AS q
    ON t.symbol = q.symbol AND t.time >= q.time;

t.symbol = q.symbol is the equality condition. You can supply multiple equality conditions. t.time >= q.time is the closest-match condition: ClickHouse selects the row in quotes with the greatest q.time that is still less than or equal to t.time, per matching symbol. Exactly one closest-match condition is permitted.

Supported inequality operators: >, >=, <, <=. The operator controls direction:

  • >= or >: find the nearest prior row (most common — "last known value before this timestamp")
  • <= or <: find the nearest future row (forward lookup — "next scheduled event after this timestamp")

USING clause form

SELECT t.symbol, t.volume, q.price
FROM trades AS t
ASOF JOIN quotes AS q
USING (symbol, time);

With USING, equality columns come first and the ASOF column must be last. The comparison operator defaults to >=. This form is shorter but cannot express < or > direction.

Column type requirements

The ASOF column must be one of: Int*, UInt*, Float32, Float64, Date, DateTime, or Decimal. Nullable variants of these types are not supported on the right table side; attempting one produces: ASOF join over right table Nullable column is not implemented.

How It Works

For each row in the left table, ClickHouse evaluates the equality conditions to identify a candidate set of rows in the right table, then selects the single row in that set that is closest to the ASOF column value according to the chosen operator.

The result is always at most one matched row from the right table per left-side row — there is no fan-out. Two join types are supported:

  • ASOF JOIN (inner semantics): left rows with no qualifying right-side match are dropped.
  • ASOF LEFT JOIN: all left rows are preserved. When no match exists, right-side columns are filled with the type's default value (0 for numeric types, empty string for String), not NULL. Set join_use_nulls = 1 to get NULL for unmatched columns instead (standard SQL behavior). The default for join_use_nulls is 0.

There is no ASOF RIGHT JOIN or ASOF FULL JOIN.

Practical Examples

Financial market data

Match each executed trade to the prevailing quote price at the moment of execution:

SELECT
    t.symbol,
    t.trade_id,
    t.time          AS trade_time,
    q.time          AS quote_time,
    q.ask_price,
    t.quantity * q.ask_price AS notional_value
FROM trades AS t
ASOF LEFT JOIN quotes AS q
    ON t.symbol = q.symbol AND t.time >= q.time
ORDER BY t.time;

IoT sensor readings with calibration

Match each sensor reading to the calibration record that was active at the time of the reading, without needing to denormalize calibration state into every reading row at ingestion:

SELECT
    r.sensor_id,
    r.recorded_at,
    r.raw_value,
    c.calibration_factor,
    r.raw_value * c.calibration_factor AS calibrated_value
FROM readings AS r
ASOF LEFT JOIN calibrations AS c
    ON r.sensor_id = c.sensor_id AND r.recorded_at >= c.applied_from;

Forward lookup: next scheduled maintenance

Use the < operator to find the nearest future event rather than the nearest past event:

SELECT
    r.sensor_id,
    r.recorded_at,
    m.scheduled_at AS next_maintenance
FROM readings AS r
ASOF LEFT JOIN maintenance_schedule AS m
    ON r.sensor_id = m.sensor_id AND r.recorded_at < m.scheduled_at;

Chained ASOF JOINs for ML feature engineering

Multiple ASOF JOINs in a single query are supported, which is useful for assembling feature vectors from several time-series sources:

SELECT
    mt.user_id,
    mt.event_time,
    dv.value  AS domains_visited_per_hour,
    uips.value AS unique_ips_per_hour,
    mt.label
FROM events AS mt FINAL
ASOF JOIN domains_feature AS dv
    ON mt.user_id = dv.user_id AND mt.event_time >= dv.event_time
ASOF JOIN ip_feature AS uips
    ON mt.domain = uips.domain AND mt.event_time >= uips.event_time;

Performance Considerations

Supported algorithms

ASOF JOIN is supported by exactly two join algorithms:

Algorithm Notes
hash Builds an in-memory hash table from the right table, sorted within each equality-key group. Default for ASOF JOIN. Works well when the right table fits in memory.
full_sorting_merge Sort-merge approach. Added ASOF support in ClickHouse 24.7 (July 2024). Can exploit the physical sort order of MergeTree tables, skipping the sort phase. Significantly lower memory usage on large right tables.

The following algorithms do not support ASOF JOIN: parallel_hash, grace_hash, partial_merge, and the direct algorithm. The Join table engine also does not support ASOF JOIN.

Important: setting join_algorithm = 'auto' is unsafe for ASOF JOIN. The auto mode falls back to partial_merge when memory limits are exceeded, and partial_merge does not support ASOF strictness, which will cause the query to fail. Explicitly specify hash or full_sorting_merge.

When to use full_sorting_merge

On a 100-million-row dataset, full_sorting_merge ASOF JOIN ran over 2x faster (5.0 s vs 11.8 s) and used over 2x less peak memory (2.4 GiB vs 6.5 GiB) compared to hash join ASOF JOIN. To use it:

SELECT t.user_id, t.event_time, p.price
FROM events AS t
ASOF JOIN prices AS p
    ON t.user_id = p.user_id AND t.event_time >= p.price_time
SETTINGS join_algorithm = 'full_sorting_merge';

For maximum benefit, the right table (and ideally the left table) should be a MergeTree with ORDER BY matching the join key prefix, placing equality columns first and the ASOF column last:

CREATE TABLE quotes (
    symbol   LowCardinality(String),
    time     DateTime,
    price    Float64
) ENGINE = MergeTree()
ORDER BY (symbol, time);

When ClickHouse detects that the table is already sorted in the required order, it skips the sort phase entirely and proceeds directly to the merge step.

Diagnosing which algorithm ran

Use EXPLAIN PIPELINE to confirm the algorithm selected:

EXPLAIN PIPELINE
SELECT t.symbol, q.price
FROM trades AS t
ASOF LEFT JOIN quotes AS q
    ON t.symbol = q.symbol AND t.time >= q.time;

Hash join appears as HashJoin in the pipeline. Full sorting merge shows MergeSortingTransform and MergingSortedTransform processors.

Comparison to Alternatives

argMax aggregation: For a single point-in-time lookup against one key (SELECT argMax(price, time) FROM quotes WHERE symbol = 'AAPL' AND time <= now()), argMax is typically faster because it can exploit the table's primary index directly and avoids the join overhead. ASOF JOIN is more valuable when you need to stitch two complete time series together across many keys in one pass.

Correlated subquery: A correlated subquery achieves the same result but re-executes for every left-side row. ASOF JOIN processes both tables in a single pass and is orders of magnitude faster at scale.

Window functions: LAG or LAST_VALUE within a single table work well for within-table "previous row" patterns. They cannot reach across two different tables with different granularities, which is the core ASOF JOIN use case.

Relevant Settings

Setting Default Notes
join_algorithm default (hash for ASOF) Use hash or full_sorting_merge for ASOF. Avoid auto.
join_use_nulls 0 Set to 1 to return NULL for unmatched columns instead of type defaults.
max_rows_in_join 0 (unlimited) Caps hash table rows. If exceeded with join_algorithm=auto, ClickHouse falls back to partial_merge, which breaks ASOF JOIN.
max_bytes_in_join 0 (unlimited) Caps hash table memory. Same caveat as above.
log_processors_profiles 0 Set to 1 to enable system.processors_profile_log for pipeline-level profiling.

Root-Cause Analysis with System Tables

Check memory and duration for recent ASOF JOIN queries

SELECT
    query_id,
    query_duration_ms,
    memory_usage,
    read_rows,
    result_rows
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query ILIKE '%ASOF%'
ORDER BY event_time DESC
LIMIT 10;

Pipeline-level profiling for a specific query

Requires SET log_processors_profiles = 1 before running the query.

SELECT
    name,
    elapsed_us,
    input_rows,
    output_rows,
    input_wait_elapsed_us
FROM system.processors_profile_log
WHERE query_id = 'your-query-id'
ORDER BY elapsed_us DESC;

Look for HashJoin (hash algorithm selected) or MergeSortingTransform/MergingSortedTransform (full sorting merge selected).

Monitor live ASOF JOIN queries

SELECT query_id, elapsed, memory_usage, read_rows
FROM system.processes
WHERE query ILIKE '%ASOF%';

Inspect current join-related settings

SELECT name, value, default
FROM system.settings
WHERE name ILIKE '%join%'
ORDER BY name;

How Pulse Helps

Time-series join queries built on ASOF JOIN are sensitive to the choice of algorithm, right-table size, and sort order of underlying MergeTree tables. Pulse is an AI DBA for ClickHouse that continuously monitors query performance, detects ASOF JOIN regressions across version upgrades, and provides root-cause analysis correlating slowdowns to memory pressure, sort order mismatches, and algorithm selection.

Frequently Asked Questions

Q: Does ASOF JOIN require the right table to be pre-sorted?
A: No. Pre-sorting is not a correctness requirement. With the hash algorithm, ClickHouse builds the hash table correctly regardless of physical sort order. Pre-sorting (via MergeTree ORDER BY) is a performance optimization for full_sorting_merge: when the table is already sorted on the join key, ClickHouse skips the sort phase, reducing both time and memory.

Q: What happens to left-side rows with no match in ASOF LEFT JOIN?
A: By default (join_use_nulls = 0), unmatched right-side columns are filled with the type's default value: 0 for numeric types, empty string for String, etc. Setting join_use_nulls = 1 makes unmatched columns return NULL instead, which is standard SQL behavior. ASOF JOIN (without LEFT) drops unmatched left-side rows entirely.

Q: Is it safe to use join_algorithm = 'auto' with ASOF JOIN?
A: No. The auto mode falls back to partial_merge when memory limits (max_rows_in_join or max_bytes_in_join) are exceeded. partial_merge does not support ASOF strictness and will cause the query to fail. Always specify hash or full_sorting_merge explicitly for ASOF JOIN queries.

Q: Can the ASOF column be Nullable?
A: Not on the right table side. Using a Nullable type for the ASOF column on the right table produces the error: ASOF join over right table Nullable column is not implemented. Use non-nullable Date, DateTime, Int*, UInt*, Float*, or Decimal columns. The left table's ASOF column can be Nullable.

Q: Can I use ASOF JOIN with the Join table engine?
A: No. The Join table engine does not support ASOF JOIN. Attempting to use it will produce an error. You cannot pre-materialize a right-side ASOF table the way you can with regular hash joins backed by the Join engine. Use a standard MergeTree table on the right side instead.

Q: How does the USING clause differ from the ON clause for ASOF JOIN?
A: With USING (col1, col2, asof_col), all listed columns are used for joining, equality columns come first, and the ASOF column must be last. The comparison operator is always >= (nearest prior value). The ON clause is more flexible: you can specify any of the four operators (>, >=, <, <=) and write multiple explicit equality conditions. Use ON when you need < or > direction, or when column names differ between the two tables.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.