NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Sampling not supported

The "DB::Exception: Sampling not supported" error in ClickHouse occurs when you use a SAMPLE clause in a query against a table that was not created with a sampling key. ClickHouse requires a SAMPLE BY expression in the table definition to enable sampling. Without it, the SAMPLING_NOT_SUPPORTED error is raised.

Impact

The query fails immediately. No data is returned. The table itself is unaffected. This error is particularly common when users try to approximate query results on large datasets without first setting up the table for sampling.

Common Causes

  1. Table created without a SAMPLE BY clause -- the most common cause. The table definition does not include a sampling key.
  2. Using SAMPLE on a non-MergeTree engine -- SAMPLE is only supported by MergeTree family engines, and only when configured with a sampling key.
  3. Querying a view or subquery with SAMPLE -- views and subqueries do not directly support SAMPLE.
  4. Distributed table pointing to tables without sampling keys -- even through a Distributed table, the underlying local tables must have sampling keys.
  5. Confusing SAMPLE with LIMIT -- users sometimes try SAMPLE when they actually want LIMIT or a percentage-based approximation.

Troubleshooting and Resolution Steps

  1. Check if the table has a sampling key:

    SELECT sampling_key FROM system.tables
    WHERE database = 'default' AND name = 'your_table';
    

    An empty result means sampling is not configured.

  2. If you need sampling, create the table with a SAMPLE BY clause:

    CREATE TABLE sampled_table (
        id UInt64,
        event_date Date,
        user_id UInt64,
        data String
    )
    ENGINE = MergeTree()
    PARTITION BY toYYYYMM(event_date)
    ORDER BY (user_id, id)
    SAMPLE BY id;  -- id must be part of ORDER BY and should have good distribution
    

    The sampling key must be included in the ORDER BY (sorting key) expression and should ideally be a hash or uniformly distributed value.

  3. Add a sampling key to an existing table (requires data migration):

    -- Create new table with sampling key
    CREATE TABLE your_table_sampled AS your_table
    ENGINE = MergeTree()
    ORDER BY (existing_key, intHash32(id))
    SAMPLE BY intHash32(id);
    
    -- Migrate data
    INSERT INTO your_table_sampled SELECT * FROM your_table;
    
    -- Swap tables
    RENAME TABLE your_table TO your_table_old, your_table_sampled TO your_table;
    
  4. Use alternatives to SAMPLE for approximate results:

    -- LIMIT for a fixed number of rows
    SELECT * FROM your_table LIMIT 1000;
    
    -- Approximate aggregate functions
    SELECT uniqHLL12(user_id) FROM your_table;
    SELECT countIf(status = 'active') FROM your_table;
    
    -- WHERE-based sampling using modular arithmetic
    SELECT * FROM your_table WHERE cityHash64(id) % 10 = 0;  -- ~10% sample
    
  5. For Distributed tables, ensure all local tables have the sampling key:

    SELECT database, name, sampling_key FROM system.tables
    WHERE engine LIKE '%MergeTree%' AND name = 'your_local_table';
    

Best Practices

  • Plan for sampling during table design. Adding a sampling key later requires data migration.
  • Choose a sampling key with good distribution. Using intHash32 or intHash64 on an ID column works well.
  • The sampling key must be part of the primary key (ORDER BY). Plan your sorting key to accommodate this.
  • Consider whether you truly need SAMPLE or whether approximate functions or LIMIT would suffice for your use case.
  • For analytical workloads on very large tables, sampling can dramatically speed up exploratory queries.

Frequently Asked Questions

Q: Can I add a SAMPLE BY to an existing table without recreating it?
A: No. The SAMPLE BY expression must be part of the ORDER BY, and changing the ORDER BY requires recreating the table with data migration. There is no ALTER statement to add a sampling key in place.

Q: What makes a good sampling key?
A: A good sampling key has uniform distribution across its value range. Using a hash function like intHash32(id) or cityHash64(user_id) ensures even distribution regardless of the original data distribution.

Q: What does SAMPLE 0.1 actually do?
A: SAMPLE 0.1 reads approximately 10% of the data. ClickHouse uses the sampling key to deterministically select which rows to include. Results are consistent across repeated queries with the same sample ratio.

Q: Can I use SAMPLE with any table engine?
A: No. SAMPLE is only supported by MergeTree family engines (MergeTree, ReplicatedMergeTree, etc.) that have a SAMPLE BY clause in their definition. Other engines like Log, Memory, or Kafka do not support it.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.