The "DB::Exception: Cannot parse Protobuf schema" error occurs when ClickHouse tries to load a Protocol Buffers schema file and fails. The CANNOT_PARSE_PROTOBUF_SCHEMA error code indicates that the .proto file is either syntactically invalid, references undefined types, or is otherwise incompatible with ClickHouse's Protobuf parser. This error surfaces when using the Protobuf or ProtobufSingle input/output formats.
Impact
When this error fires, ClickHouse cannot process data in the specified Protobuf format:
- INSERT and SELECT operations that depend on the Protobuf schema fail immediately.
- The error occurs at schema loading time, before any data is processed, so no rows are read or written.
- Until the schema file is corrected, all Protobuf-format operations referencing it will fail.
Common Causes
- Syntax errors in the .proto file -- missing semicolons, unclosed braces, invalid field numbers, or other proto syntax mistakes.
- Unsupported proto syntax version -- using
syntax = "proto3"features in a way that conflicts with ClickHouse's parser, or omitting the syntax declaration entirely. - Missing import dependencies -- the .proto file imports another proto file that ClickHouse cannot find at the specified
format_schema_path. - Undefined message or enum types -- referencing a type that is declared in an imported file but the import path is wrong.
- Schema file not found -- the path specified in the
FORMAT Protobuf SETTINGS format_schema = '...'does not exist or is not readable by the ClickHouse process. - Incompatible field types -- using Protobuf types that ClickHouse does not support or map to its type system.
Troubleshooting and Resolution Steps
Validate the .proto file independently using the
protoccompiler:protoc --proto_path=/path/to/schemas --descriptor_set_out=/dev/null /path/to/schemas/my_schema.protoIf
protocreports errors, fix those first.Check the schema path in ClickHouse. The
format_schemasetting should reference the file relative to theformat_schema_pathdirectory:SELECT * FROM system.settings WHERE name = 'format_schema_path';Then confirm the file exists at that location:
ls -la /var/lib/clickhouse/format_schemas/my_schema.protoVerify the syntax version declaration. Ensure your .proto file starts with:
syntax = "proto3";or
syntax = "proto2";Check imports. If your proto file imports other files, those files must also be in the
format_schema_pathdirectory. Relative paths in import statements are resolved from that directory.Verify the message name. The
format_schemasetting should include both the file and message name:SET format_schema = 'my_schema:MyMessage';If the message name is wrong or missing, ClickHouse cannot find the correct message type.
Check file permissions. The ClickHouse server process must have read access to the .proto file:
sudo -u clickhouse cat /var/lib/clickhouse/format_schemas/my_schema.protoTest with a minimal schema to isolate the problem. Create a simple .proto file and verify ClickHouse can parse it before adding complexity.
Best Practices
- Always validate .proto files with
protocbefore deploying them to the ClickHouse schema directory. - Keep all proto dependencies (imported files) in the same
format_schema_pathdirectory. - Use version control for your .proto files and test schema changes in a staging environment before production.
- Explicitly declare the proto syntax version (
proto2orproto3) at the top of every file. - Document which message types are used by which ClickHouse tables so that schema changes can be coordinated.
Frequently Asked Questions
Q: Where should I place .proto files for ClickHouse to find them?
A: Place them in the directory specified by the format_schema_path setting, which defaults to /var/lib/clickhouse/format_schemas/. You can check the current value in system.settings.
Q: Does ClickHouse support both proto2 and proto3?
A: Yes, ClickHouse supports both proto2 and proto3 syntax. Make sure the syntax declaration at the top of the file matches the actual syntax used.
Q: How do I specify which message to use from a .proto file with multiple messages?
A: Use the format_schema setting with the format filename:MessageName. For example: SET format_schema = 'events:EventRecord'.
Q: Can I use nested Protobuf messages with ClickHouse?
A: Yes, ClickHouse supports nested messages and maps them to Tuple or Nested column types. Ensure that the table schema matches the nested structure of your Protobuf messages.
Q: The .proto file compiles fine with protoc but ClickHouse rejects it. Why?
A: ClickHouse uses its own Protobuf schema parser, which may have subtle differences from protoc. Check for features that ClickHouse may not support, such as certain options, extensions, or very new proto3 features. The ClickHouse server log often provides more specific error details.