ClickHouse DB::Exception: Only NULLs while reading schema

The DB::Exception: ONLY_NULLS_WHILE_READING_SCHEMA error occurs when ClickHouse tries to automatically infer the schema of an external data source --- such as a file, URL, or S3 object --- but every sampled value in one or more columns is NULL. Because ClickHouse cannot determine a meaningful data type from NULLs alone, it raises this error instead of guessing.

Impact

This error blocks any query that depends on schema inference for the affected source. SELECT queries against file(), url(), or s3() table functions will fail, and so will CREATE TABLE ... AS SELECT workflows that rely on auto-detected types. No data is read or inserted when the error is raised.

Common Causes

The data file contains columns that are entirely NULL or empty across the rows ClickHouse samples for inference.
The input_format_max_rows_to_read_for_schema_inference setting is too low, causing ClickHouse to only sample a small number of header rows that happen to be NULL.
Sparse datasets where meaningful values appear much later in the file than the inference sample window.
JSON data where certain keys are absent from the first N objects, making their values appear as NULL to the schema reader.
CSV files with empty columns that are interpreted as NULL rather than empty strings.

Troubleshooting and Resolution Steps

Explicitly specify the schema instead of relying on inference. This is the most reliable fix:

SELECT *
FROM file('data.csv', 'CSVWithNames', 'id UInt64, name String, value Nullable(Float64)')
LIMIT 10;

Increase the number of rows sampled for schema inference so ClickHouse has a better chance of encountering non-NULL values:

SET input_format_max_rows_to_read_for_schema_inference = 50000;
SELECT * FROM file('sparse_data.json', 'JSONEachRow') LIMIT 10;

For JSON formats, enable the setting that treats missing keys as defaults rather than NULLs:
```
SET input_format_json_defaults_for_missing_elements_in_named_tuple = 1;
```
Preprocess your data to ensure the first rows contain representative non-NULL values, or add a header row that establishes column types.
Use DESCRIBE with an explicit schema hint to verify what ClickHouse would infer:
```
DESCRIBE file('data.parquet');
```
If specific columns show as Nullable(Nothing), those are the problematic ones.

As a last resort, define a target table with the correct schema and use INSERT ... SELECT with explicit casting:

CREATE TABLE target (id UInt64, name String, value Nullable(Float64)) ENGINE = MergeTree() ORDER BY id;
INSERT INTO target SELECT * FROM file('data.csv', 'CSVWithNames', 'id UInt64, name String, value Nullable(Float64)');

Best Practices

Always specify schemas explicitly for production data pipelines rather than relying on inference.
When using schema inference for exploratory work, set input_format_max_rows_to_read_for_schema_inference to a sufficiently large value.
Validate upstream data quality to ensure columns are not entirely NULL before feeding them to ClickHouse.
Use schema caching (use_cache_for_count_from_files) to avoid repeated inference on the same unchanged files.

Frequently Asked Questions

Q: Why does ClickHouse not just default NULL columns to String or Nullable(String)?
A: Choosing an arbitrary type could lead to subtle data corruption or unexpected behavior downstream. ClickHouse prefers to fail loudly so you can specify the correct type intentionally.

Q: Can I skip columns that are all NULLs during inference?
A: Not directly. You need to provide an explicit schema that either omits those columns or assigns them a concrete type like Nullable(String).

Q: Does this error occur with Parquet files too?
A: It is less common with Parquet because the file format embeds its own schema. However, if a Parquet column is typed as null with no logical type, you may still see this error.