NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Convert ClickHouse MergeTree to ReplicatedMergeTree

Converting a standalone MergeTree table to ReplicatedMergeTree is a common task when adding replication to a cluster after the fact. The naive approach, copying every row with INSERT INTO ... SELECT, works for small tables but doubles disk usage and is slow for production-sized data. ClickHouse parts are immutable files, so the most efficient route uses filesystem hardlinks via ATTACH PARTITION FROM. There are five practical conversion methods, ranging from a one-liner to the built-in embedded conversion shipped in recent versions.

Method 1: INSERT INTO SELECT

Good only for small tables. Create the replicated target with the same schema, then copy:

INSERT INTO foo_replicated SELECT * FROM foo;

This doubles storage during the migration and rewrites every part. Skip it for anything above a few gigabytes.

This is the workhorse for production migrations. ATTACH PARTITION FROM uses hardlinks, so no extra disk space is consumed and the operation is effectively free in terms of compute. Until new data is inserted into the replicated table, both tables share the same underlying part files.

Full example:

-- Original MergeTree table
CREATE TABLE foo
(
    A Int64,
    D Date,
    S String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(D)
ORDER BY A;

-- Sample data
INSERT INTO foo SELECT number, today(),     '' FROM numbers(1e8);
INSERT INTO foo SELECT number, today() - 60, '' FROM numbers(1e8);

SELECT count() FROM foo;

Create the replicated table with an identical schema:

CREATE TABLE foo_replicated AS foo
ENGINE = ReplicatedMergeTree(
    '/clickhouse/{cluster}/tables/{database}/{table}/{shard}',
    '{replica}'
)
PARTITION BY toYYYYMM(D)
ORDER BY A;

Stop merges so parts do not change mid-flight, then generate the attach statements directly from system.parts:

SYSTEM STOP MERGES;

SELECT DISTINCT
    'ALTER TABLE foo_replicated ATTACH PARTITION ID \'' ||
    partition_id || '\' FROM foo;'
FROM system.parts
WHERE table = 'foo' AND active;

Sample output:

ALTER TABLE foo_replicated ATTACH PARTITION ID '202111' FROM foo;
ALTER TABLE foo_replicated ATTACH PARTITION ID '202201' FROM foo;

You can pipe the generated SQL straight back into the client:

clickhouse-client -q "SELECT DISTINCT 'ALTER TABLE foo_replicated ATTACH PARTITION ID \'' || partition_id || '\' FROM foo;' FROM system.parts WHERE table='foo' AND active FORMAT TabSeparatedRaw" \
  | clickhouse-client -mn

After attaching, resume merges, verify counts, and swap names:

SYSTEM START MERGES;

SELECT count() FROM foo_replicated;

RENAME TABLE foo TO foo_old, foo_replicated TO foo;

foo_old keeps hardlinks until you drop it. The drop is cheap as long as no new data has diverged.

Method 3: In-place file manipulation

ClickHouse documents a manual procedure where you stop the server, move data directories, and recreate the table with the replicated engine. This is invasive and error-prone compared with ATTACH PARTITION, but it can be useful if you need the table name to stay the same throughout. See the official docs under engines/table-engines/mergetree-family/replication for the file moves required.

Method 4: clickhouse-backup

The clickhouse-backup tool can capture a MergeTree table and restore it into a ReplicatedMergeTree. This is convenient if you already use it for routine backups and want to migrate during a restore cycle.

Method 5: Embedded convert_to_replicated flag

Recent ClickHouse versions support an in-place conversion driven by a flag file. Drop an empty convert_to_replicated file into the table's flags/ directory and the next server restart loads the table as ReplicatedMergeTree using the values of default_replica_path and default_replica_name:

# Find the data path
clickhouse-client -q "SELECT data_paths FROM system.tables WHERE database='mydb' AND table='foo'"
# Create the flag (path/flag combinations vary by version; consult the docs)
sudo -u clickhouse touch /var/lib/clickhouse/data/mydb/foo/flags/convert_to_replicated
sudo systemctl restart clickhouse-server

Check the official documentation for your version before relying on this in production. To create the converted table on other replicas, you still need to issue an explicit CREATE TABLE ... ReplicatedMergeTree(...) with the matching ZooKeeper path.

Engine path conventions

The recommended replication path uses macros so every replica resolves to a unique znode:

/clickhouse/{cluster}/tables/{database}/{table}/{shard}

{replica} is the replica name inside that shard. Keep these macros consistent across nodes via the server config, otherwise replicas will attach themselves to the wrong znode.

Common Pitfalls

  • Forgetting SYSTEM STOP MERGES before attaching. New merges on the source can change part IDs while you iterate.
  • Dropping foo_old before confirming row counts match. Always run SELECT count() on both sides first.
  • Mixing schemas. CREATE TABLE foo_replicated AS foo only copies columns. Re-declare PARTITION BY and ORDER BY explicitly to avoid surprises.
  • Reusing a replication path that already has data in Keeper. Always start with a fresh znode or SYSTEM DROP REPLICA first.
  • Running the migration with applications still writing. Pause ingestion or use a dual-write path during the cutover.

Frequently Asked Questions

Q: Does ATTACH PARTITION FROM copy data? A: No. It creates hardlinks, so disk usage stays roughly the same until merges or new inserts diverge the underlying part files.

Q: Can I run the conversion while writes continue? A: Not safely with method 2. New parts arriving after you generate the attach list will be missed. Pause ingestion, or repeat the procedure for any partitions that received late data.

Q: What happens to the old MergeTree table? A: Until you drop it, it coexists harmlessly. Because of hardlinks it consumes little additional disk space. Drop it once the replicated table is validated.

Q: Do I need to recreate indexes or materialized views? A: Indexes live inside parts and are preserved on attach. Materialized views that reference the original table by name need to be recreated to point at the new replicated table after rename.

Q: Can I attach parts from a MergeTree to a ReplicatedMergeTree on a different replica? A: No. ATTACH PARTITION FROM must run on the same server where both tables exist. Other replicas pick the data up automatically via the replication protocol.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.