The "DB::Exception: Cannot parse text input" error is one of the more common parsing errors you will encounter in ClickHouse. Identified by the error code CANNOT_PARSE_TEXT, it fires when the server tries to interpret a text value and fails to convert it into the target column's data type. Think of it as a catch-all for type conversion problems -- the input simply does not look like what the column expects.
Impact
When ClickHouse raises CANNOT_PARSE_TEXT, the INSERT or query that triggered it fails and no rows from that batch are committed. The practical consequences include:
- Data ingestion stops until the offending data is corrected or skipped.
- Pipelines relying on strict error-free inserts will back up.
- In large batch imports, a single bad value can block millions of otherwise valid rows.
Common Causes
- Type mismatch between source data and column definition -- a string like
"N/A"landing in anInt32column, or a decimal value going into an integer column. - Locale-specific formatting -- numbers with commas as thousands separators (e.g.,
1,000) or decimal commas (e.g.,3,14) that ClickHouse does not expect. - Empty strings for non-Nullable columns -- an empty field in CSV that maps to a non-Nullable numeric or date column.
- Encoding issues -- non-UTF-8 characters or invisible control characters embedded in otherwise normal-looking text.
- Misaligned columns -- the data has shifted (a missing delimiter earlier in the row) so a text value ends up being parsed as a number or date.
Troubleshooting and Resolution Steps
Read the full error message carefully. ClickHouse usually includes the column name, the expected type, and the offending value. This narrows the search immediately:
DB::Exception: Cannot parse text 'abc' as Int32: ...Isolate the bad rows. Use
input_format_allow_errors_numto let ClickHouse skip a limited number of bad rows and ingest the rest:SET input_format_allow_errors_num = 10; INSERT INTO my_table FORMAT CSV ...The skipped rows are logged so you can inspect them afterward.
Check for empty fields. If your source data uses empty strings for missing values and the target column is not Nullable, either make the column Nullable or provide a default:
ALTER TABLE my_table MODIFY COLUMN my_col Nullable(Int32);Handle locale-specific number formats. Strip thousands separators or convert decimal commas before import:
sed 's/,//g' data.csv > data_clean.csvUse
clickhouse-localto test parsing. This lets you validate data without touching a running server:clickhouse-local --query="SELECT * FROM file('data.csv', CSV, 'a Int32, b String') LIMIT 10"Cast explicitly in your query. If the data sometimes contains values that need coercion, use
toInt32OrNull(),toFloat64OrZero(), or similar safe-cast functions to handle edge cases gracefully.Check encoding. Run
fileon your data to confirm it is UTF-8:file /path/to/data.csv
Best Practices
- Define Nullable columns for any field that might legitimately be empty or missing in source data.
- Use ClickHouse's
OrNullandOrZerofunction variants (e.g.,toInt32OrNull) when building transformation queries to avoid hard failures. - Validate data types at the source before they reach ClickHouse, especially when dealing with user-generated or third-party data.
- Set
input_format_allow_errors_numorinput_format_allow_errors_ratioin production pipelines to prevent a single bad row from blocking an entire batch. - Log and alert on skipped rows so that data quality issues are visible rather than silently ignored.
Frequently Asked Questions
Q: How is CANNOT_PARSE_TEXT different from more specific errors like CANNOT_PARSE_NUMBER?
A: CANNOT_PARSE_TEXT is a general-purpose parsing error. ClickHouse may raise more specific error codes (like CANNOT_PARSE_NUMBER or CANNOT_PARSE_DATE) depending on the target type. When it does not have a more specific code, it falls back to CANNOT_PARSE_TEXT.
Q: Can I skip bad rows instead of failing the entire insert?
A: Yes. Set input_format_allow_errors_num to the maximum number of rows you are willing to skip, or use input_format_allow_errors_ratio to specify a percentage. Skipped rows are discarded from the insert but the rest go through.
Q: My CSV has empty fields for integer columns. What is the cleanest fix?
A: Either change the column to Nullable(Int32) so that empty values become NULL, or set input_format_csv_empty_as_default = 1 so that ClickHouse uses the column's default value for empty fields.
Q: I see this error only with certain rows in a large file. How do I find them?
A: Use clickhouse-local with a permissive schema (all String columns) to load the file, then query for rows where casting fails: SELECT * FROM file('data.csv', CSV, 'a String, b String') WHERE toInt32OrNull(a) IS NULL AND a != ''.