NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Cannot detect data format

The "DB::Exception: Cannot detect data format" error in ClickHouse occurs when the server attempts to automatically determine the format of input data but cannot identify it. The CANNOT_DETECT_FORMAT error typically arises when using table functions like file(), url(), or s3() without explicitly specifying a format. ClickHouse tries to infer the format from the file extension or content, and this error means that inference failed.

Impact

The query that triggered format auto-detection will fail. No data is read or inserted. This is a configuration error that is resolved by explicitly specifying the format. Existing data and other queries are not affected.

Common Causes

  1. The file has no extension, or the extension does not map to a known ClickHouse format
  2. The file extension is non-standard or ambiguous (e.g., .dat, .txt, .log)
  3. The file content does not match any recognizable format pattern
  4. The URL endpoint returns data with no Content-Type header or a generic MIME type
  5. Using STDIN or piped input without specifying a format
  6. The file is empty or contains too little data for reliable format detection

Troubleshooting and Resolution Steps

  1. Specify the format explicitly in your query to bypass auto-detection:

    -- Instead of relying on auto-detection
    SELECT * FROM file('/path/to/data.dat');
    -- Specify the format
    SELECT * FROM file('/path/to/data.dat', 'CSV', 'col1 UInt32, col2 String');
    
  2. For S3 or URL sources, include the format parameter:

    SELECT * FROM s3('https://bucket.s3.amazonaws.com/data.gz', 'CSV', 'col1 UInt32, col2 String');
    SELECT * FROM url('https://api.example.com/data', 'JSONEachRow');
    
  3. Check the file extension mapping. ClickHouse recognizes these common extensions:

    • .csv -> CSV
    • .tsv / .tab -> TabSeparated
    • .json / .jsonl / .ndjson -> JSONEachRow
    • .parquet -> Parquet
    • .orc -> ORC
    • .avro -> Avro
    • .arrow -> Arrow
    • .native -> Native
  4. Rename the file to use a recognized extension if you want auto-detection to work:

    mv data.txt data.csv
    
  5. For stdin or piped data, always specify the format:

    cat data.csv | clickhouse-client --query "INSERT INTO table FORMAT CSV"
    
  6. If using clickhouse-local, specify format in the query:

    clickhouse-local --query "SELECT * FROM file('data.txt', 'TSV') LIMIT 10"
    

Best Practices

  • Always specify the format explicitly in production queries rather than relying on auto-detection. This makes queries self-documenting and avoids ambiguity.
  • Use standard file extensions that ClickHouse recognizes if you prefer auto-detection in ad-hoc workflows.
  • When building data pipelines, hardcode the format in your query templates to prevent format detection failures from breaking the pipeline.
  • For compressed files, use double extensions like .csv.gz to help ClickHouse detect both the compression and format.

Frequently Asked Questions

Q: When does ClickHouse auto-detect the format?
A: ClickHouse attempts format auto-detection when you use table functions like file(), url(), or s3() without providing a format parameter. It uses the file extension and, in some cases, the content to determine the format.

Q: Can ClickHouse detect the format from file content alone?
A: In some cases, yes. ClickHouse can recognize Parquet files by their magic bytes, for instance. However, text-based formats like CSV and TSV are harder to distinguish from content alone, which is why the file extension is the primary detection mechanism.

Q: What happens with compressed files?
A: ClickHouse can detect compression from extensions like .gz, .zst, .bz2, .xz, and .lz4. For format detection on compressed files, use compound extensions like .csv.gz so ClickHouse knows both the compression method and the data format inside.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.