The "DB::Exception: Cannot compile regular expression" error in ClickHouse is raised when a regex pattern supplied to functions like match(), extract(), extractAll(), replaceRegexpOne(), or replaceRegexpAll() fails to compile. The error code is CANNOT_COMPILE_REGEXP. ClickHouse uses the RE2 library (or Hyperscan for certain multi-pattern operations), and the pattern must conform to the syntax rules of the underlying engine.
Impact
This error causes the query containing the invalid regular expression to fail immediately. No partial results are returned. The impact is typically limited to the individual query:
- Dashboards or applications relying on regex-based filtering or extraction will show errors
- ETL pipelines using regex transformations will halt at the failing step
- If the pattern is embedded in a view definition, the view becomes unusable until the expression is corrected
Common Causes
- Invalid regex syntax -- Unmatched parentheses, brackets, or braces; dangling quantifiers; or other structural errors in the pattern string.
- Unsupported RE2 features -- RE2 does not support backreferences (
\1), lookaheads ((?=...)), or lookbehinds ((?<=...)). Patterns relying on these PCRE features will fail. - Incorrect escaping -- Special regex characters not properly escaped, or SQL string escaping interfering with the regex pattern. A common mistake is using a single backslash where two are needed.
- Empty pattern string -- Passing an empty string or NULL as the regex pattern.
- Invalid character class -- Malformed character classes like
[a-or using POSIX classes not supported by the engine. - Pattern generated dynamically -- When patterns are constructed from user input or column values, unexpected characters can break the regex syntax.
Troubleshooting and Resolution Steps
Examine the full error message to identify the exact position and nature of the syntax error. ClickHouse typically includes the RE2 error description:
DB::Exception: Cannot compile regular expression `[invalid(`: missing closing bracketTest the pattern with RE2 syntax rules. RE2 uses a subset of PCRE. Verify your pattern is compatible:
-- Simple test SELECT match('test string', 'your_pattern_here');Fix common escaping issues. In ClickHouse SQL strings, backslashes need to be doubled:
-- Wrong: single backslash eaten by SQL parser SELECT match(col, '\d+'); -- Correct: escaped backslash reaches RE2 SELECT match(col, '\\d+');Replace unsupported PCRE features with RE2 alternatives:
-- Lookahead not supported in RE2 -- Instead of: (?=foo)bar -- Rewrite the logic using separate match() calls or extractGroups() -- Backreferences not supported -- Instead of: (abc)\1 -- Use application-level logic or multiple queriesValidate dynamically generated patterns before passing them to regex functions:
-- If building patterns from user input, escape special characters first SELECT match(col, concat('\\Q', user_input, '\\E'));Note: RE2 does not support
\Q...\E, so you would need to manually escape metacharacters or useposition()/like()for literal matching instead.Check for invisible characters in the pattern string, especially when copying from web pages or documents. Non-printable characters can cause confusing compilation failures.
Best Practices
- Prefer
like()orposition()over regex when you only need simple substring matching -- they are faster and less error-prone. - Always double-escape backslashes in SQL string literals for regex patterns.
- Keep regex patterns as simple as possible. Complex patterns are harder to maintain and debug.
- When working with user-supplied patterns, validate and sanitize them before use, or catch the error gracefully in your application layer.
- Test regex patterns in isolation with a simple
SELECT match('sample', 'pattern')query before embedding them in production queries. - Consult the RE2 syntax reference for the definitive list of supported features.
Frequently Asked Questions
Q: Does ClickHouse use PCRE or RE2 for regular expressions?
A: ClickHouse primarily uses the RE2 library for regex functions like match(), extract(), and replaceRegexpOne(). RE2 guarantees linear-time matching but does not support some PCRE features like backreferences and lookaheads. The Hyperscan library is used for multi-pattern functions like multiMatchAny().
Q: Why does my regex work in Python/Java but fail in ClickHouse?
A: Most programming languages use PCRE-compatible engines that support features RE2 intentionally omits, such as backreferences and lookaround assertions. You will need to rewrite patterns that rely on those features.
Q: Can I use regex flags like case-insensitive matching?
A: Yes. RE2 inline flags are supported. Use (?i) at the start of the pattern for case-insensitive matching: match(col, '(?i)hello'). Alternatively, use the matchCaseInsensitive() function.
Q: What happens if the regex pattern comes from a table column?
A: ClickHouse evaluates the regex for each row, which means compilation happens per-row if the pattern varies. If any row contains an invalid pattern, the query fails. Consider filtering or validating pattern columns beforehand.