EmbeddedRocksDB Table Engine and Dictionary Source

Q: How do I make joins against an EmbeddedRocksDB table fast?

Set join_algorithm = 'direct,hash' and use LEFT ANY JOIN . The direct join issues key-value lookups against the RocksDB store and skips building a hash table; it falls back to hash join if the right table doesn't support direct lookups or if the join type is not LEFT ANY JOIN (the only join type the direct algorithm supports).

ClickHouse is built for sequential, column-oriented scans, but some workloads need the opposite: fast random reads and writes keyed by a single identifier. The EmbeddedRocksDB table engine embeds Facebook's RocksDB key-value store inside ClickHouse, giving you low-latency point lookups and per-row upserts/deletes that MergeTree handles poorly.

This guide covers the EmbeddedRocksDB engine itself, when to choose it over MergeTree, and how to use it as the backing store for a DIRECT-layout dictionary and for fast direct joins.

What EmbeddedRocksDB Is For

MergeTree is optimized for inserting large batches and scanning ranges of sorted data. It is a poor fit for:

Point lookups — WHERE id = 42 on MergeTree still reads granules and applies the primary index; RocksDB resolves a single key directly.
Frequent single-row updates and deletes — MergeTree mutations rewrite whole parts; RocksDB updates a key in place.
Upsert semantics — RocksDB overwrites the value for an existing key, which is exactly upsert behavior without ReplacingMergeTree deduplication or FINAL.

EmbeddedRocksDB stores data on the local node's filesystem and is not replicated by ClickHouse. It is a single-node, local key-value store. Use it for dimension/lookup tables, small mutable state, and dictionary backends — not for your main analytical fact tables.

Creating an EmbeddedRocksDB Table

The engine requires a PRIMARY KEY, and the primary key must be exactly one column:

CREATE TABLE customers
(
    customer_id UInt64,
    name        String,
    tier        String,
    updated_at  DateTime
)
ENGINE = EmbeddedRocksDB
PRIMARY KEY customer_id;

The primary key is serialized as the RocksDB key; all other columns are serialized together as the RocksDB value. Because the key space is a RocksDB store, inserting a row whose key already exists overwrites the previous value rather than appending a duplicate:

INSERT INTO customers VALUES (1, 'Acme', 'gold', now());
INSERT INTO customers VALUES (1, 'Acme Corp', 'platinum', now());
-- customer_id = 1 now holds the second row only

Engine Parameters

The engine accepts optional positional parameters: EmbeddedRocksDB([ttl, rocksdb_dir, read_only]).

CREATE TABLE sessions
(
    session_id String,
    user_id    UInt64,
    payload    String
)
ENGINE = EmbeddedRocksDB(86400, '/var/lib/clickhouse/rocksdb/sessions', 0)
PRIMARY KEY session_id;

ttl — time-to-live in seconds. 0 (the default) disables TTL.
rocksdb_dir — path to an existing RocksDB directory, or where a new one should be created. Useful for attaching an externally produced store.
read_only — when set, opens the store read-only and skips TTL compaction.

Supported Operations

EmbeddedRocksDB supports the operations you'd expect from a mutable key-value table:

-- Upsert (INSERT overwrites on existing key)
INSERT INTO customers VALUES (2, 'Globex', 'silver', now());

-- Point lookup (resolved directly by RocksDB key)
SELECT * FROM customers WHERE customer_id = 2;

-- Update non-key columns (the primary key cannot be modified)
ALTER TABLE customers UPDATE tier = 'gold' WHERE customer_id = 2;

-- Delete by key
DELETE FROM customers WHERE customer_id = 2;

-- Clear the whole store
TRUNCATE TABLE customers;

Lookups that filter by the primary key (= or IN) are served as efficient key fetches. A query without a primary-key filter falls back to a full iteration of the store.

Bulk Inserts

For loading large volumes, ClickHouse can build sorted SST files instead of inserting key-by-key. This is controlled by:

optimize_for_bulk_insert — enabled by default (1); writes SST files directly for faster bulk loads.
bulk_insert_block_size — the block size used when building SST files during bulk insert.

Approximate COUNT

A count() over an EmbeddedRocksDB table normally iterates the store. The setting optimize_trivial_approximate_count_query lets ClickHouse return RocksDB's fast approximate row estimate instead of an exact scan:

SELECT count() FROM customers
SETTINGS optimize_trivial_approximate_count_query = 1;

Use this when an approximate row count is acceptable and you want to avoid scanning the entire store.

EmbeddedRocksDB vs. MergeTree

Aspect	EmbeddedRocksDB	MergeTree
Access pattern	Random point lookups by single key	Sequential range scans, analytics
Primary key	Exactly one column, used as the storage key	One or more columns, used for sorting/skipping
Duplicate keys	Overwritten (upsert)	Kept; dedup needs ReplacingMergeTree + `FINAL`
Updates/deletes	In-place per key, cheap	Mutations rewrite parts, expensive
Replication	None (local node only)	ReplicatedMergeTree replicates via Keeper
Compression / scans	Not columnar; poor for wide aggregations	Columnar, highly compressed, fast aggregation
Best for	Dimension/lookup tables, mutable state, dictionary backends	Fact tables, time series, analytical workloads

Rule of thumb: if you mostly SELECT ... WHERE key = x or upsert single rows, use EmbeddedRocksDB. If you mostly scan and aggregate ranges, use MergeTree.

Direct Joins with EmbeddedRocksDB

ClickHouse's fastest join algorithm — the direct join — works only when the right-hand (dictionary-like) table supports low-latency key-value lookups. EmbeddedRocksDB is one of the engines that qualifies (alongside the Dictionary and Join engines). With a direct join, ClickHouse skips building a hash table and instead issues key lookups against RocksDB:

SELECT e.event_id, c.name, c.tier
FROM events AS e
LEFT ANY JOIN customers AS c ON e.customer_id = c.customer_id
SETTINGS join_algorithm = 'direct,hash';

Setting join_algorithm = 'direct,hash' tells ClickHouse to use the direct algorithm when the right table supports it, and to fall back to hash join otherwise. The direct algorithm only applies to LEFT ANY JOIN — for other join types (including plain INNER JOIN), ClickHouse falls back to hash join regardless. This makes an EmbeddedRocksDB table an efficient enrichment/dimension table for streaming joins where LEFT ANY semantics are acceptable.

Using EmbeddedRocksDB as a Dictionary Backend

A common point of confusion: there is no EMBEDDED_ROCKSDB dictionary SOURCE type. ClickHouse does not expose RocksDB as a direct dictionary source. Instead, you back a dictionary with an EmbeddedRocksDB table and point the dictionary at it via the CLICKHOUSE source, combined with a DIRECT layout.

The DIRECT layout stores nothing in memory and forwards every lookup to the source. Pairing it with an EmbeddedRocksDB table means each dictionary lookup becomes a fast RocksDB key fetch — you get dictionary ergonomics (dictGet, automatic join routing) without periodically reloading the whole dataset into RAM, and updates to the underlying table are visible immediately.

-- 1. The key-value source table
CREATE TABLE dim_customers
(
    customer_id UInt64,
    name        String,
    tier        String
)
ENGINE = EmbeddedRocksDB
PRIMARY KEY customer_id;

-- 2. A DIRECT-layout dictionary backed by it
CREATE DICTIONARY customers_dict
(
    customer_id UInt64,
    name        String,
    tier        String
)
PRIMARY KEY customer_id
SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 TABLE 'dim_customers' DB 'default' USER 'default'))
LAYOUT(DIRECT());

Then enrich queries with dictGet:

SELECT
    event_id,
    dictGet('customers_dict', 'name', customer_id) AS name,
    dictGet('customers_dict', 'tier', customer_id) AS tier
FROM events;

This pattern is ideal when the dimension data changes frequently (so a cached or flat-layout dictionary would be stale or constantly reloading) or is too large to hold a full in-memory copy, while still needing single-key lookups to be fast.

Best Practices

Single-key dimension data only. EmbeddedRocksDB's one-column primary key suits surrogate-keyed lookup tables. If your natural key is composite, build a single composite key column (for example a tuple hash or concatenation).
Don't use it for analytics. RocksDB is row-oriented; wide aggregations and large range scans belong in MergeTree.
Remember it's local and unreplicated. Plan backups around rocksdb_dir, and don't rely on it for data that must survive node loss without an external recovery path.
Prefer direct,hash for enrichment joins. It avoids building a hash table when the right side is an EmbeddedRocksDB (or Dictionary/Join) table and the join is LEFT ANY JOIN. For other join types, ClickHouse falls back to hash join automatically.
Pair with DIRECT layout for live dictionaries. Use the EmbeddedRocksDB-backed CLICKHOUSE source + DIRECT() when freshness matters more than caching.
Bulk-load large datasets. Insert in large blocks so optimize_for_bulk_insert can build SST files instead of writing keys one at a time.

Common Issues

"PRIMARY KEY must be specified" / multi-column key rejected. The engine requires exactly one primary key column. Collapse composite keys into a single column.
Unexpectedly missing rows after re-insert. Inserting an existing key overwrites the prior value — this is upsert behavior, not a bug. If you expected duplicates, EmbeddedRocksDB is the wrong engine.
Slow count(). A plain count() iterates the store. Use optimize_trivial_approximate_count_query = 1 if an approximate count is acceptable.
Trying to use a rocksdb dictionary source. There isn't one. Back the dictionary with an EmbeddedRocksDB table and a CLICKHOUSE source + DIRECT() layout, as shown above.
Expecting replication. EmbeddedRocksDB data lives on one node and is not replicated by ClickHouse. For replicated mutable lookups, reconsider the design (for example a ReplicatedMergeTree-backed dictionary).

How Pulse Helps

Pulse monitors ClickHouse clusters and surfaces the operational details that make engines like EmbeddedRocksDB safe to run in production — local disk usage under rocksdb_dir, dictionary load and lookup behavior, mutation activity, and join algorithm selection on hot queries. Because EmbeddedRocksDB is node-local and unreplicated, Pulse's per-node visibility helps you catch storage growth and backup gaps before they become incidents. If you're weighing EmbeddedRocksDB against MergeTree or a different dictionary layout for a given workload, the Pulse team can review the access pattern and recommend a fit. Learn more at pulse.support.

Frequently Asked Questions

Q: Can EmbeddedRocksDB be replicated like ReplicatedMergeTree?

No. EmbeddedRocksDB stores data on the local node's filesystem and is not replicated by ClickHouse. If you need replicated, mutable lookups, back a dictionary with a ReplicatedMergeTree table or replicate the source data through other means.

Q: Is there a dedicated EMBEDDED_ROCKSDB dictionary source?

No. Create an EmbeddedRocksDB table, then define the dictionary with SOURCE(CLICKHOUSE(... TABLE 'your_rocksdb_table' ...)) and LAYOUT(DIRECT()). The dictionary's lookups become RocksDB key fetches.

Q: Does inserting a row with an existing primary key create a duplicate?

No — it overwrites the existing value for that key. This gives you upsert semantics natively, without ReplacingMergeTree or FINAL.

Q: Why use EmbeddedRocksDB instead of MergeTree for a lookup table?

For single-key point lookups and frequent per-row updates/deletes, RocksDB resolves keys directly and updates in place, whereas MergeTree scans granules and rewrites parts during mutations. For range scans and aggregations, MergeTree remains the right choice.

Q: How do I make joins against an EmbeddedRocksDB table fast?

Set join_algorithm = 'direct,hash' and use LEFT ANY JOIN. The direct join issues key-value lookups against the RocksDB store and skips building a hash table; it falls back to hash join if the right table doesn't support direct lookups or if the join type is not LEFT ANY JOIN (the only join type the direct algorithm supports).

Q: Is EmbeddedRocksDB available on ClickHouse Cloud?

EmbeddedRocksDB is a local, single-node engine tied to node filesystem storage, which conflicts with Cloud's shared-storage architecture. Treat it as a self-hosted feature and verify availability against current Cloud documentation before relying on it there.