sequenceMatch and sequenceCount: Sequence Analysis Functions

sequenceMatch and sequenceCount are parametric aggregate functions that detect ordered patterns of events within a group of rows — for example, "did a user view page A, then add to cart, then check out?". They are the workhorses of behavioral and funnel analysis in ClickHouse, but their pattern semantics and limits surprise people who expect regex-style strictness.

This guide covers their signatures, the pattern mini-language (including time constraints), the difference between them, and the gotchas — gap-skipping behavior, the 32-condition cap, and the "too difficult" iteration error — that trip up real queries.

What These Functions Do

Both functions are parametric aggregates: they take a pattern as a parameter and then a list of arguments as the aggregate's input. They evaluate the pattern against the rows of each group, ordered by a timestamp you supply.

  • sequenceMatch(pattern)(timestamp, cond1, cond2, ...) returns 1 (UInt8) if at least one chain of events matching the pattern exists in the group, 0 otherwise.
  • sequenceCount(pattern)(timestamp, cond1, cond2, ...) returns the number (UInt64) of non-overlapping chains matching the pattern. After a match is found, the search resumes from where the match ended.

Because they are aggregates, you almost always combine them with GROUP BY (per user, per session, per device) so the pattern is evaluated independently for each entity.

Signature and Arguments

sequenceMatch(pattern)(timestamp, cond1, cond2, ...)
sequenceCount(pattern)(timestamp, cond1, cond2, ...)
Part Description
pattern A string literal in the pattern mini-language (see below).
timestamp The column that orders events. Must be Date, DateTime, or an unsigned integer type (UInt*).
cond1 … condN Boolean conditions of type UInt8 describing events. You can pass up to 32 conditions.

The conditions are referenced positionally inside the pattern: (?1) is the first condition you passed, (?2) the second, and so on, up to (?32).

Pattern Syntax

The pattern string is a small language, not a full regular expression. The supported tokens are:

Token Meaning
(?N) Matches the event where condition number N (1–32) is true.
. Matches any single event (any row in the group).
.* Matches any number of events (including zero) between two conditions.
(?t operator value) A time constraint in seconds between the two adjacent matched events. Supported operators: >=, >, <, <=, ==.

A few worked patterns:

  • (?1)(?2) — condition 1 happens, then condition 2 happens at the very next matched event.
  • (?1).*(?2) — condition 1, then later condition 2, with any number of events in between.
  • (?1)(?t<=3600)(?2) — condition 2 occurs within 3600 seconds (one hour) after condition 1.
-- "Did number 2 follow number 1 in this group?"
CREATE TABLE t (time UInt32, number Int32) ENGINE = Memory;
INSERT INTO t VALUES (1, 1), (2, 3), (3, 2);

SELECT sequenceMatch('(?1)(?2)')(time, number = 1, number = 2) FROM t;
-- Result: 1  (the intervening 3 is simply ignored)

The Gap-Skipping Gotcha

This is the single most important behavior to internalize: sequenceMatch and sequenceCount ignore events that do not match any condition. The pattern (?1)(?2) does not require condition 2 to be the physically next row — it requires it to be the next row that matches some condition. Everything else is skipped.

In the example above, (?1)(?2) matched even though number = 3 sat between the 1 and the 2. If you need strict adjacency — "B must immediately follow A with nothing in between" — add a catch-all condition that matches the "anything else" events and reference it so the pattern can detect an interruption:

SELECT sequenceMatch('(?1)(?2)')(
    time,
    number = 1,
    number = 2,
    number NOT IN (1, 2)   -- (?3): the "something else happened" event
)
FROM t;
-- Now an intervening non-1/non-2 event participates in matching,
-- so a pattern requiring strict adjacency can be made to fail.

The general technique is to model the "noise" events as their own condition so the pattern language can see them instead of silently skipping them.

sequenceMatch vs. sequenceCount

Both share identical pattern syntax and arguments. The difference is the return value and the search strategy.

Aspect sequenceMatch sequenceCount
Return type UInt8 (0 or 1) UInt64 (a count)
Question answered "Did the pattern occur at least once?" "How many non-overlapping times did the pattern occur?"
Search after a hit Stops at the first match Resumes searching after the matched chain
Overlapping matches N/A Counts non-overlapping chains only
Typical use Funnel completion flags, cohort membership Repeat-behavior counts, loop detection
-- How many times does a 2 appear after a 1?
CREATE TABLE t2 (time UInt32, number Int32) ENGINE = Memory;
INSERT INTO t2 VALUES (1,1), (2,3), (3,2), (4,1), (5,3), (6,2);

SELECT sequenceCount('(?1).*(?2)')(time, number = 1, number = 2) FROM t2;
-- Result: 2

Note that sequenceCount counts non-overlapping chains: once it matches 1 … 2, it starts the next search from the position after that match. It will not double-count a single 2 against two different earlier 1s.

Practical Example: A 3-Step Funnel

Suppose an events table records (user_id, event_time, event_name). To flag which users completed view → add-to-cart → purchase, in order, with add-to-cart occurring within 30 minutes before purchase (the (?t<=1800) constraint bounds only the gap between those two adjacent steps):

SELECT
    user_id,
    sequenceMatch('(?1).*(?2)(?t<=1800).*(?3)')(
        event_time,
        event_name = 'view',
        event_name = 'add_to_cart',
        event_name = 'purchase'
    ) AS completed_funnel
FROM events
GROUP BY user_id
HAVING completed_funnel = 1;

To instead count how many times users repeated a "search then purchase" loop across their whole history:

SELECT
    user_id,
    sequenceCount('(?1).*(?2)')(
        event_time,
        event_name = 'search',
        event_name = 'purchase'
    ) AS purchase_loops
FROM events
GROUP BY user_id;

For an end-to-end funnel where you care about the furthest step each user reached within a sliding time window rather than an exact ordered pattern, `windowFunnel` is usually the cleaner tool — it returns the depth of the funnel reached, not just a boolean.

Common Issues

"Pattern application proves too difficult, exceeding max iterations (1000000)"

The pattern matcher has a hard internal cap of 1,000,000 iterations per group. Groups with very long event histories combined with broad patterns (lots of .*, many conditions, or conditions that match a large fraction of rows) can blow through it and raise this error. Mitigations:

  • Tighten conditions so fewer rows match each (?N).
  • Add time constraints ((?t<=...)) to bound how far the matcher searches between steps.
  • Pre-filter the input with a WHERE clause so each group contains only relevant events.
  • Reduce per-group cardinality (e.g., partition the analysis by session rather than by lifetime user).

Ties on the timestamp

Events that share the same timestamp value may be ordered arbitrarily relative to one another, which can change the result non-deterministically. If two steps can occur within the same second, use a higher-resolution timestamp (e.g., DateTime64) or a strictly increasing sequence column to order events.

Distributed tables

sequenceMatch/sequenceCount operate per group; make sure all events for a given entity land on the same shard (shard by user_id or session_id). If an entity's events are spread across shards, partial sequences computed independently per shard can produce incorrect results.

No negation operator

The pattern language has no built-in "this event must NOT occur between A and B" operator. To approximate negation, model the unwanted event as its own condition (as in the gap-skipping example) and construct the pattern so its presence breaks the match.

Best Practices

  1. Always GROUP BY the entity. Without grouping by user/session/device, the pattern is evaluated across unrelated events and the result is meaningless.

  2. Order with a reliable timestamp. Prefer a monotonically increasing column. Use DateTime64 if sub-second ordering matters; integer sequence numbers avoid tie ambiguity entirely.

  3. Pre-filter aggressively. A WHERE event_name IN (...) before aggregation shrinks each group, sidesteps the iteration limit, and speeds the query.

  4. Bound time with (?t ...) constraints. They make patterns both more correct (funnels usually have a time window) and cheaper to evaluate.

  5. Pick the right tool. Use sequenceMatch for boolean "did it happen", sequenceCount for repeat counts, `windowFunnel` for "how far down the funnel", and retention for cohort retention grids. For free-form ordered logic, array functions over a `groupArray` of events can express patterns these functions can't.

  6. Stay under 32 conditions. That is a hard cap. If you need more event types, collapse related events into fewer conditions or split the analysis.

When a parametric sequence function feels awkward, these are often a better fit:

  • `windowFunnel` — returns the maximum number of consecutive funnel steps reached within a sliding time window. Ideal for classic conversion funnels where you want step depth, not an exact pattern.
  • `groupArray` + array functions — collect each entity's events into an ordered array and use array functions (or `arrayJoin` to unnest) for custom matching logic that the pattern language can't express, including negation and conditional aggregation.
  • Conditional aggregates such as `avgIf` — when you only need per-step counts or averages rather than ordered detection, an *If aggregate is simpler and faster than a sequence pattern.

How Pulse Helps

Sequence and funnel queries are easy to write incorrectly — a missing GROUP BY, a timestamp with second-level ties, or an over-broad pattern can return wrong results silently or fail with the iteration-limit error under production data volumes. Pulse provides ongoing monitoring and expert support for ClickHouse, helping teams catch queries that scan far more data than expected, validate that sequence analysis is sharded correctly across a cluster, and tune schemas (timestamp precision, partitioning, pre-filtering) so behavioral analytics stay correct and fast as data grows. If you are building funnel or event-pattern analytics on ClickHouse, Pulse can review the approach before it becomes a production incident.

Frequently Asked Questions

Q: Does sequenceMatch('(?1)(?2)') require the two events to be physically adjacent rows?

No. It requires condition 2 to be the next event that matches any condition — non-matching events in between are ignored. To enforce strict adjacency, add a catch-all condition for the "other" events so the pattern can detect an interruption.

Q: What is the maximum number of conditions I can pass?

Up to 32 conditions ((?1) through (?32)). This is a hard limit. If you need more event types, combine related ones into fewer conditions.

Q: How is sequenceCount different from just counting matches?

sequenceCount counts non-overlapping chains. After it finds a match, it resumes searching from the end of that match, so a single later event is never counted against multiple earlier ones.

Q: Why am I getting "Pattern application proves too difficult, exceeding max iterations (1000000)"?

A group has too many candidate events for the pattern, exhausting the internal 1,000,000-iteration cap. Tighten your conditions, add (?t ...) time constraints, pre-filter rows with WHERE, or reduce per-group cardinality (e.g., analyze per session instead of per lifetime user).

Q: Should I use sequenceMatch or windowFunnel for a conversion funnel?

Use sequenceMatch when you need a strict boolean "did this exact ordered pattern occur". Use `windowFunnel` when you want to know how far down the funnel each user got within a time window — it returns step depth, which is what most conversion dashboards actually need.

Q: Why are my results non-deterministic?

Events sharing the same timestamp value can be ordered arbitrarily, changing which chain matches. Use a higher-resolution timestamp (DateTime64) or a strictly increasing sequence column so event order is unambiguous.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.