NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Hyperscan cannot scan text

The "DB::Exception: Hyperscan cannot scan text" error in ClickHouse is raised when the Hyperscan (or its fork, Vectorscan) regex engine encounters a failure while scanning text against one or more patterns. The error code is HYPERSCAN_CANNOT_SCAN_TEXT. ClickHouse uses Hyperscan for multi-pattern matching functions like multiMatchAny(), multiMatchAllIndices(), and multiSearchAny(), and this error indicates the engine could not complete the scan.

Impact

When this error occurs, the query that triggered the Hyperscan scan fails:

  • No results are returned for the affected query
  • Applications depending on multi-pattern matching will experience failures
  • The underlying data is not affected in any way
  • Other queries not using Hyperscan functions continue to work normally

Common Causes

  1. Pattern too complex -- Hyperscan has limits on the complexity of individual patterns and pattern sets. Extremely long patterns, deeply nested alternations, or patterns that cause excessive DFA state explosion will be rejected at scan time.
  2. Scratch space exhaustion -- Hyperscan requires scratch memory proportional to the complexity of the compiled pattern database. Extremely large pattern sets can exhaust available scratch space.
  3. Input text too large -- Very long input strings can cause Hyperscan to exceed internal limits during scanning.
  4. Resource limits -- Memory constraints (system or cgroup) can prevent Hyperscan from allocating the resources it needs.
  5. Incompatible pattern flags -- Certain combinations of Hyperscan flags (e.g., HS_FLAG_SOM_LEFTMOST with certain pattern types) can cause scan failures.
  6. Bug in the Hyperscan/Vectorscan library -- Edge cases in the library itself can trigger scan failures for specific pattern and input combinations.

Troubleshooting and Resolution Steps

  1. Simplify the pattern set to identify which pattern causes the failure:

    -- Test patterns individually
    SELECT multiMatchAny('test input text', ['pattern1']);
    SELECT multiMatchAny('test input text', ['pattern2']);
    -- Continue until you identify the problematic pattern
    
  2. Reduce pattern complexity. If a pattern uses deeply nested alternations or unbounded repetitions, simplify it:

    -- Instead of a very complex pattern
    -- '(a|b|c|d|e|f|g){0,100}.*complex.*'
    -- Try breaking it into simpler patterns
    SELECT multiMatchAny(text, ['simpler_pattern1', 'simpler_pattern2']);
    
  3. Check the input data size. If scanning very long strings, try truncating:

    SELECT multiMatchAny(substring(long_text_column, 1, 10000), ['pattern'])
    FROM my_table;
    
  4. Use RE2-based alternatives when Hyperscan is not required:

    -- Instead of multiMatchAny for a single pattern
    SELECT match(text, 'pattern');
    
    -- For multiple patterns, use OR
    SELECT match(text, 'pattern1') OR match(text, 'pattern2');
    
  5. Increase memory available to ClickHouse if the issue is resource-related:

    SELECT formatReadableSize(value) FROM system.metrics WHERE metric = 'MemoryTracking';
    
  6. Check the ClickHouse version and consider upgrading. Newer versions include updated Vectorscan/Hyperscan libraries with bug fixes:

    SELECT version();
    
  7. Limit the number of concurrent queries using Hyperscan to reduce peak memory consumption from scratch space allocations.

Best Practices

  • Keep regex patterns as simple as possible when using multi-pattern functions. Hyperscan is optimized for many simple patterns, not a few complex ones.
  • Test patterns against representative data before deploying them in production queries.
  • Set max_hyperscan_regexp_length and max_hyperscan_regexp_total_length to reasonable limits to prevent accidentally deploying overly complex pattern sets.
  • Consider using multiMatchAny with SETTINGS allow_hyperscan = 1 explicitly to make it clear Hyperscan is being used.
  • For single-pattern matching, prefer match() (RE2-based) over Hyperscan multi-pattern functions.
  • Monitor memory usage when running Hyperscan-heavy workloads, as the scratch space requirements can be substantial.

Frequently Asked Questions

Q: What is the difference between Hyperscan and Vectorscan?
A: Hyperscan is a high-performance regex engine originally developed by Intel. Vectorscan is a community-maintained fork that continues development after Intel reduced its involvement. Recent ClickHouse versions use Vectorscan. From a user perspective, the API and behavior are identical.

Q: Can I disable Hyperscan entirely?
A: Yes. Setting allow_hyperscan = 0 will prevent queries from using Hyperscan-based functions. Those functions will then raise an error if called, so you will need to rewrite queries to use RE2-based alternatives.

Q: Why does my pattern compile but fail during scanning?
A: Pattern compilation and text scanning are separate phases. A pattern may compile successfully but fail during scanning due to resource limits, specific input characteristics, or runtime conditions that were not apparent at compile time.

Q: How many patterns can Hyperscan handle at once?
A: Hyperscan can handle thousands of patterns simultaneously, which is its primary advantage. However, the total complexity of the pattern set matters more than the count. A thousand simple literal patterns will work fine, while a handful of extremely complex patterns may cause issues.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.