The "DB::Exception: Cannot parse quoted string" error occurs when ClickHouse encounters a quoted field in its input that does not follow the expected quoting rules. This CANNOT_PARSE_QUOTED_STRING error is overwhelmingly a CSV-format problem -- a field starts with a double quote but never closes it properly, or the escaping within the quoted region is broken. The result is a parser that cannot determine where the field ends.
Impact
When this error is raised, the entire INSERT fails:
- No rows from the batch are written to the table.
- The error can be difficult to locate visually in large files since a single unmatched quote can cause the parser to consume the rest of the file looking for the closing quote.
- Pipelines that ingest user-generated content or free-text fields are especially prone to this issue.
Common Causes
- Unmatched quotes -- a field opens with
"but never closes, often because the data contains a literal double quote that was not escaped. - Wrong escape convention -- the source uses backslash escaping (
\") inside quoted fields, but ClickHouse CSV expects doubled quotes ("") by default. - Mixed quoting styles -- some fields use single quotes while others use double quotes, but ClickHouse expects a consistent style.
- Multiline quoted fields -- a field contains embedded newlines within quotes, and the parser or transport layer splits the record at the newline boundary.
- Encoding issues -- curly/smart quotes (e.g., Unicode characters U+201C and U+201D) instead of the standard ASCII double quote (
").
Troubleshooting and Resolution Steps
Identify the problematic row. The error message often includes a byte offset or partial content. Use that to find the row in the source data:
# Search for lines with an odd number of double quotes awk -F'"' 'NF%2==0' data.csv | head -5Check the quoting convention. In standard CSV (RFC 4180), double quotes inside a quoted field must be escaped by doubling them:
"He said ""hello"" to me"If your data uses backslash escaping instead, enable it:
SET format_csv_allow_double_quotes = 0;Enable single-quote support if needed. If your data uses single quotes for quoting:
SET input_format_csv_allow_single_quotes = 1;Handle smart/curly quotes. Preprocess the file to replace Unicode quotes with ASCII ones:
sed "s/[\x{201C}\x{201D}]/\"/g" data.csv > data_fixed.csvCheck for multiline fields. If fields contain embedded newlines, make sure the transport layer is not splitting records at newline boundaries. The entire quoted field, including its newlines, must be sent as one unit.
Use
input_format_allow_errors_numto skip bad rows and process the rest:SET input_format_allow_errors_num = 20; INSERT INTO my_table FORMAT CSVTest with
clickhouse-localto isolate parsing issues without involving the server:clickhouse-local --query="SELECT * FROM file('data.csv', CSV, 'a String, b String, c Int32') LIMIT 100"
Best Practices
- Ensure your data producers use RFC 4180-compliant CSV output, which escapes double quotes by doubling them.
- Validate CSV files with a dedicated parser (such as Python's
csvmodule) before importing into ClickHouse. - When dealing with user-generated text that may contain quotes, prefer JSON formats (like JSONEachRow) over CSV since JSON has well-defined string escaping rules.
- Standardize on ASCII double quotes throughout your pipeline to avoid issues with curly or smart quotes.
- Use
clickhouse-localfor pre-import validation as part of your CI/CD or data pipeline.
Frequently Asked Questions
Q: How does ClickHouse expect quotes to be escaped in CSV?
A: By default, ClickHouse follows the RFC 4180 convention: a double quote inside a quoted field must be represented as two consecutive double quotes (""). For example, the value She said "hi" should be encoded as "She said ""hi""".
Q: Can I use backslash escaping for quotes in CSV mode?
A: ClickHouse CSV format primarily expects doubled-quote escaping. However, you can experiment with format_csv_allow_double_quotes and related settings. For backslash-style escaping, the TSV format is a better fit.
Q: My data has both single and double quotes. Which format should I use?
A: JSONEachRow is the safest choice because JSON has unambiguous string escaping rules. If you must use CSV, standardize on one quoting style and escape the other within fields.
Q: Is there a way to auto-fix quoting issues in CSV files?
A: You can use Python's csv module to read and rewrite the file, which will normalize the quoting. There is no built-in ClickHouse tool for this, so preprocessing is the recommended approach.