What is ClickHouse? Architecture, Use Cases, and Key Concepts

ClickHouse is an open-source column-oriented database management system built specifically for online analytical processing (OLAP). It was developed internally at Yandex starting around 2009 to power Yandex.Metrica - at the time, the second-largest web analytics platform in the world, tracking billions of page views per day. Yandex open-sourced it in 2016 under the Apache 2.0 license, and it has since been adopted as a backend for observability platforms, data warehouses, and real-time analytics pipelines by companies including Cloudflare, Uber, and Shopify. A separate commercial entity, ClickHouse Inc., now maintains the project and offers ClickHouse Cloud as a managed service.

The core design premise is straightforward: most analytical queries read a handful of columns across enormous numbers of rows. Row-oriented databases like PostgreSQL or MySQL store all columns for a given row together on disk, which means a query scanning 3 out of 200 columns still reads the full row width from storage. ClickHouse stores each column in separate files. A query that touches 5 columns reads only those 5 columns off disk, compresses each independently, and processes them in vectorized batches. That combination - reduced I/O, high compression ratios, and CPU-efficient batch processing - is why ClickHouse can return sub-second results over datasets with billions of rows.

Storage Architecture: MergeTree and How Data Lands on Disk

The MergeTree engine family is ClickHouse's primary storage engine, and understanding it is non-negotiable if you're going to run the database in production. When you insert data, ClickHouse writes it as an immutable part - a directory on disk containing column data (one file per column in wide format for larger parts, or all columns in a single file in compact format for smaller freshly-inserted parts), sorted according to the table's ORDER BY key. Parts are written in full and then closed; they're never updated in place. In the background, ClickHouse continuously merges smaller parts into larger ones, similar in spirit to LSM-tree compaction but tuned for analytical read patterns rather than write amplification minimization.

Each part contains a sparse primary index. Rather than indexing every row, ClickHouse groups rows into granules (8,192 rows by default) and stores one index entry per granule. This makes the index small enough to fit in memory and allows ClickHouse to binary search the index to find candidate granule ranges, then scan only those ranges from the column files. A query with a highly selective WHERE clause on the ORDER BY key columns can skip the vast majority of a table's data without reading it.

Beyond the primary index, MergeTree supports data skipping indexes - secondary indexes that store lightweight aggregations over granules, such as min/max values (minmax), bloom filters, or set membership (set). A minmax index on a timestamp column, for instance, lets ClickHouse skip entire granules where the min and max timestamps fall outside the query's time range, even if timestamp isn't the first column in the ORDER BY key. These are not B-tree indexes; they're statistical summaries and their effectiveness depends on data distribution and ordering.

The MergeTree family includes specialized variants. ReplacingMergeTree deduplicates rows with the same sorting key (defined by ORDER BY) during merges, useful for emulating upsert semantics. AggregatingMergeTree pre-aggregates rows with matching keys and stores partial aggregate states, which is the foundation for materialized views that maintain running counts, sums, or quantile sketches. CollapsingMergeTree and VersionedCollapsingMergeTree support row-level state machines, typically used to model mutable entity states by inserting sign-flipped records rather than issuing in-place updates.

Query Execution: Vectorized, Compiled, and Parallelized

ClickHouse processes queries using a vectorized execution model derived from the same ideas as MonetDB/X100. Instead of executing one row at a time through an operator tree (the Volcano model most relational databases use), ClickHouse passes chunks of column data between operators. Each operator receives a block of values for each relevant column, applies its operation across the entire block in a tight loop, and passes the result to the next operator. The loop structure keeps data in CPU cache between operations and allows the compiler to emit SIMD instructions that process 4, 8, or 16 values in a single CPU instruction depending on the data type and available ISA extensions.

On top of vectorization, ClickHouse uses LLVM-based just-in-time compilation for expression evaluation. An expression like a * b + c + 1 - which would naively be three separate operator passes - gets fused into a single compiled loop. GROUP BY pipelines that aggregate over complex combinations of expressions benefit from this substantially; the ClickHouse team has documented up to 2x throughput improvements from JIT compilation on grouping workloads. Runtime compilation happens transparently and is cached across queries with the same expression structure.

Parallelism is applied aggressively at multiple levels. A single query is split across all available CPU cores, with each core working on a disjoint set of granules. On a distributed ClickHouse cluster, the initiating node fans out sub-queries to all shards, each of which scans its local data in parallel, then aggregates partial results back at the coordinator. This architecture means that throwing more hardware at ClickHouse - more cores or more shards - translates directly into proportional query throughput improvements for CPU-bound scans, with minimal coordination overhead.

Where ClickHouse Excels

Observability and logging is the most common production use case today. Logs, spans, and metrics share a shape that ClickHouse handles particularly well: append-only, time-ordered, high write throughput, and query patterns that aggregate over time windows. A table of HTTP access logs ordered by (service, timestamp) compresses aggressively because adjacent rows share similar service names, status codes, and URL paths. ClickHouse's codec support lets you apply delta encoding to monotonic timestamp columns and LZ4 or ZSTD on top, routinely achieving 10:1 or better compression ratios on log data. Tools like HyperDX (now ClickStack), SigNoz, and OpenTelemetry-based observability pipelines use ClickHouse as their storage layer for exactly these reasons.

Product analytics and clickstream analysis are where the database originated. Counting unique users with uniqCombined, computing funnel conversion rates, breaking down events by arbitrary dimension combinations - these are all multi-billion-row aggregation queries that ClickHouse can answer in seconds. The uniqCombined and uniqExact functions are worth calling out specifically: ClickHouse ships with HyperLogLog-based approximations (uniqCombined) and exact hash-set-based counting (uniqExact) as first-class aggregate functions, which matters for high-cardinality user ID counting without pulling raw data back to the application.

Time-series workloads benefit from ClickHouse's partitioning model, where you can partition tables by month or day and set a TTL that drops entire partitions when data ages out. A PARTITION BY toYYYYMM(timestamp) table can expire a month of data by dropping a single partition directory rather than running a DELETE that triggers a full scan and mutation.

Limitations and When Not to Use It

ClickHouse is not a general-purpose database, and treating it like one is the most common source of production pain. It does not support ACID transactions across multiple rows or tables in the way PostgreSQL does. There is no row-level locking, no multi-statement transaction that can be rolled back, and no isolation level stronger than eventual consistency within a replicated cluster. If you're building a system that requires transactional integrity across multiple entity types - an order system that must atomically update inventory and create a payment record, for example - ClickHouse is the wrong tool.

Row-level mutations are expensive by design. UPDATE and DELETE in ClickHouse are implemented as mutations: asynchronous background operations that rewrite entire data parts containing affected rows. Lightweight deletes (DELETE FROM, available since v22.8) offer a faster alternative that marks rows as deleted without immediately rewriting parts. A mutation that touches many rows will saturate I/O for a sustained period and can cause parts to accumulate faster than merges can process them, leading to the Too many parts error and throttled inserts. If your workload involves frequent per-row updates - user profiles, order statuses, inventory counts - you'll either need to engineer around this with ReplacingMergeTree collapse patterns, or use a different database.

Joins are a meaningful limitation in practice. ClickHouse processes joins by loading the right-hand table (the smaller one) into memory as a hash table on each node. This works well when the right-hand side fits comfortably in RAM and when you're joining a large fact table against a small dimension table - the classic star schema join. Many-to-many joins and complex multi-hop joins remain performance challenges. For cases where the right-hand side exceeds available memory, ClickHouse offers a grace hash join algorithm (available since v22.12) that spills to disk, though at the cost of higher latency. ClickHouse is not designed to replace a relational database for join-heavy workloads.

Where ClickHouse Sits in the Landscape

ClickHouse occupies a narrow but deep niche: high-throughput analytical queries over append-heavy, structured data. It is not competing with PostgreSQL for transactional workloads. It is a closer peer to systems like Apache Druid and Apache Pinot (both designed for real-time analytics on event data), but ClickHouse generally outperforms them on ad-hoc query flexibility and has a significantly lower operational footprint.

Compared to Elasticsearch, which many teams reach for with observability data, ClickHouse trades full-text search capability for dramatically better compression and aggregation throughput. Elasticsearch is optimized for inverted index lookups and text relevance scoring; ClickHouse is optimized for columnar scans and numeric aggregations. Teams that use ClickHouse for observability often maintain a separate search layer for log-text search, or rely on ClickHouse's ngrambf_v1 bloom filter index for approximate substring matching.

The practical decision point comes down to query shape. If your queries are predominantly aggregations over narrow column sets on large row counts, with writes that are batched and append-dominant, ClickHouse is worth serious evaluation. If you need full transactional semantics, complex multi-table joins, or single-row point reads at low latency, you want a row-oriented OLTP database - and ClickHouse might still serve as a downstream analytical replica if you need to run aggregations on that same data.