ClickHouse count Function

The count function in ClickHouse is an aggregation function used to count the number of rows or non-null values in a result set. It's commonly used in data analysis to determine the size of datasets or to count occurrences of specific conditions.

Syntax

count([DISTINCT] expr)

Official Documentation

Example Usage

-- Count total number of rows
SELECT count(*) FROM users;

-- Count non-null values in a specific column
SELECT count(email) FROM users;

-- Count distinct values
SELECT count(DISTINCT country) FROM users;

-- Count with condition
SELECT count(*) FROM orders WHERE status = 'completed';

Common Issues

Performance impact when using count(DISTINCT) on high-cardinality columns.
Misunderstanding between count(*) and count(column_name) when dealing with NULL values.

Best Practices

Use count(*) for counting rows, including NULL values.
For better performance on large datasets, consider using approximate count distinct functions like uniqHLL12 for high-cardinality columns.
When possible, push down filters to reduce the amount of data processed before counting.

Frequently Asked Questions

Q: What's the difference between count(*) and count(1)?
A: In ClickHouse, there is no practical difference between count(*) and count(1). Both count the number of rows, including those with NULL values in any column.

Q: How does count() handle NULL values?
A: count(column_name) excludes NULL values, while count(*) includes all rows regardless of NULL values in any column.

Q: Can count() be used with window functions in ClickHouse?
A: Yes, count() can be used as a window function in ClickHouse. For example: count(*) OVER (PARTITION BY department).

Q: Is there a way to optimize count() for large tables?
A: For large tables, you can improve performance by using approximate counting methods like approx_count_distinct() or by leveraging ClickHouse's sampling feature.

Q: How can I combine count() with other aggregations efficiently?
A: You can combine count() with other aggregations in a single query to minimize data scans. For example: SELECT count(*), avg(salary), max(age) FROM employees.