ClickHouse Data Migration: End-to-End Guide

"Data migration" in ClickHouse covers a wide range of tasks: moving a single table to a new server, consolidating or splitting shards, copying a self-hosted cluster into ClickHouse Cloud, or rebuilding a table with a different schema. There is no single command that fits all of them — the right tool depends on whether schemas match, how much data you are moving, and how much downtime you can tolerate.

This guide gives you a taxonomy of the available methods, the SQL to run them, and an end-to-end workflow for cutting over with minimal or zero downtime. For schema migrations (DDL versioning) rather than moving rows, see ClickHouse schema migration tools; for snapshot-based copies, see the ClickHouse backup guide.

Choosing a Migration Method

Method Schemas must match? Re-shard / re-partition? CPU/RAM cost Best for
remote / remoteSecure + INSERT … SELECT No Yes High (both sides) Ad-hoc moves, schema changes, subset copies
clickhouse-local as ETL No Yes Medium Pulling from other databases or files into ClickHouse
BACKUP / RESTORE to object storage Yes No Low Whole-table/database snapshots, identical topology
FREEZE + rsync + ATTACH PARTITION Yes Manual scripting Very low Large data, tight CPU budget, identical schema
FETCH PARTITION from Keeper Yes No Low Servers sharing the same ClickHouse Keeper/ZooKeeper
Add a new replica (replication protocol) Yes No Low Moving/growing a replicated cluster automatically

The first two methods rewrite data and can change schema or sharding on the way through. The rest move data parts as-is and require the destination schema to match the source.

Method 1: remote / remoteSecure + INSERT … SELECT

The remote and remoteSecure table functions let one server read from or write to another on the fly, without defining a Distributed table. They work in both SELECT and INSERT queries, which makes them the most flexible one-time migration tool. remoteSecure uses the native TLS port (9440 by default); remote uses the plaintext native port (9000).

Pull the data by running this on the destination server:

INSERT INTO target_db.events
SELECT *
FROM remoteSecure('source-host:9440', 'source_db.events', 'user', 'password');

Push from the source side instead (useful when only the source can reach the destination, e.g. migrating into ClickHouse Cloud):

INSERT INTO FUNCTION
    remoteSecure('dest.clickhouse.cloud:9440', 'target_db.events', 'user', 'password')
SELECT * FROM source_db.events;

Because this is a plain INSERT … SELECT, you can transform data in flight — change column types, drop columns, re-partition, or route rows to a different sort order — simply by editing the SELECT. The trade-off is cost: data is decoded and re-encoded, and merges run on the destination, so both servers spend CPU and RAM. For very large tables, chunk the copy by partition or a time/id range to keep memory bounded and make the job resumable:

INSERT INTO target_db.events
SELECT * FROM remoteSecure('source-host:9440', 'source_db.events', 'user', 'password')
WHERE toYYYYMM(event_date) = 202601;

Method 2: clickhouse-local as an ETL Bridge

clickhouse-local runs the full ClickHouse engine as a standalone process with no server. It is the recommended way to migrate from other systems: read with a source table function (mysql, postgresql, s3, file, …) and write straight to the destination with remoteSecure.

clickhouse-local --query "
INSERT INTO FUNCTION
    remoteSecure('dest.clickhouse.cloud:9440', 'target_db.orders', 'user', 'password')
SELECT * FROM mysql('mysql-host:3306', 'shop', 'orders', 'user', 'password')
"

Because the format encoding happens client-side, this avoids round-tripping through intermediate CSV/Parquet files. It is also the cleanest path for migrating files (Parquet/CSV/JSON) sitting in S3 or on local disk.

Method 3: BACKUP / RESTORE

For a whole table or database where the destination schema is (or will be) identical, the built-in BACKUP/RESTORE commands are the simplest snapshot-based move. You back up to object storage on the source and restore on the destination:

-- On the source
BACKUP TABLE events TO S3('https://bucket.s3.amazonaws.com/backups/events', 'KEY', 'SECRET');

-- On the destination
RESTORE TABLE events FROM S3('https://bucket.s3.amazonaws.com/backups/events', 'KEY', 'SECRET');

This captures schema and data together and is a good fit when topology is unchanged. It does not re-shard or re-partition. See the ClickHouse backup guide and incremental backups for the full command surface, retention, and base-backup options.

Method 4: FREEZE + rsync + ATTACH PARTITION

When you need to move a lot of data with minimal CPU/RAM and the schemas are identical, move the data parts directly. ALTER TABLE … FREEZE creates hardlinked copies of parts under shadow/, costing no extra disk space at freeze time:

ALTER TABLE events FREEZE PARTITION '202601';

Copy the frozen parts to the destination host with rsync (into the table's detached/ directory), then attach them:

ALTER TABLE events ATTACH PARTITION '202601';

This is the lowest-overhead method because ClickHouse does not decode, re-sort, or re-merge the data — it just adopts existing parts. The cost is that it requires identical table definitions and some scripting, and any re-sharding has to be handled by deciding which partitions land on which destination node.

Method 5: FETCH PARTITION via ClickHouse Keeper

If source and destination share the same ClickHouse Keeper (or ZooKeeper) ensemble, the destination replica can pull a partition straight over the replication protocol:

ALTER TABLE events FETCH PARTITION '202601' FROM '/clickhouse/tables/{shard}/events';
ALTER TABLE events ATTACH PARTITION '202601';

FETCH downloads the part into detached/; ATTACH makes it live. This is efficient and avoids re-encoding, but only applies when both servers participate in the same coordination cluster.

Method 6: Add a New Replica

To move or grow a ReplicatedMergeTree cluster, the simplest approach is often to let replication do the work: stand up the new node, create the same replicated table on it pointing at the same Keeper path, and ClickHouse streams the data to it automatically. Once it has caught up, decommission the old replica. This requires the table to already be replicated; if you are on a plain MergeTree, convert it first — see convert MergeTree to ReplicatedMergeTree.

Rebalancing Shards (Re-Sharding)

Changing the number of shards is the hardest class of migration, because data has to be redistributed by the sharding key rather than copied 1:1. The general pattern is to read from the old layout and INSERT … SELECT into the new one, letting the destination's Distributed table (or your WHERE/cityHash64 routing) place each row on the correct shard:

-- Run on the new cluster; sources rows from the old cluster's distributed table
INSERT INTO new_db.events_distributed
SELECT * FROM remoteSecure('old-cluster-host:9440', 'old_db.events_distributed', 'user', 'password');

Practical notes:

  • clickhouse-copier is obsolete. It historically automated cross-cluster, cross-shard copies, but it was moved out of the main distribution in v24.2 and is no longer maintained. Prefer INSERT … SELECT over remote/Distributed, or an external orchestration job that chunks by partition. (Altinity.Cloud's Data Copy wizard is built on the old copier internals.)
  • Merging shards (reducing shard count) is generally easier than splitting them: you re-insert each source shard into the consolidated cluster.
  • For one-off layout changes on a single table — different partitioning or sort key — the same INSERT … SELECT rewrite applies; see the repartition table strategy.

Zero-Downtime and Low-Downtime Switchover

Most production migrations follow a multi-stage pattern so the application never sees a hard cutover:

  1. Schema first. Create the destination tables (and Distributed/replicated wrappers) ahead of time. Version this with your schema migration tooling.

  2. Bulk backfill. Copy historical data by partition or time range using one of the methods above. This is the long-running step; run it while the source stays live.

  3. Dual write or catch-up. Either start writing new inserts to both old and new clusters, or, after the bulk copy, run incremental INSERT … SELECT for the rows that arrived during the backfill (filtering on a monotonic timestamp/id).

  4. Verify. Compare row counts and checksums per partition before trusting the new cluster:

    SELECT count(), sum(cityHash64(*)) FROM source_db.events WHERE toYYYYMM(event_date) = 202601;
    SELECT count(), sum(cityHash64(*)) FROM target_db.events WHERE toYYYYMM(event_date) = 202601;
    
  5. Cut over reads, then writes. Point read traffic at the new cluster first (it is the safest to roll back), then move writes. Keep the old cluster intact until you are confident.

A short read-only window during step 3–5 eliminates most consistency risk; a true zero-downtime move requires dual writes or a deduplicating engine such as ReplacingMergeTree on the destination to absorb any overlap.

Best Practices

  1. Migrate by partition, not by full table. Partition-sized chunks bound memory, make the job resumable, and let you verify incrementally.
  2. Match the destination engine and sort key deliberately. A migration is a rare chance to fix a bad ORDER BY or partitioning scheme — but if you change it, you must use a rewrite method (Method 1/2), not part-moving (Method 4/5).
  3. Throttle large copies. INSERT … SELECT over remote competes with live traffic; run it off-peak and cap parallelism.
  4. Always verify before cutting over. Row counts alone miss silent corruption — compare a content hash per partition (see the switchover section).
  5. Keep the source until cutover is proven. Do not drop the old cluster until reads and writes have run cleanly on the new one for a full business cycle.
  6. Use a maintenance-tested cluster config on the destination. Review production cluster configuration before pointing traffic at it.

Common Issues

  • Memory limit exceeded during INSERT … SELECT. The query is buffering too much. Chunk by partition/range and lower max_insert_threads / max_threads, or reduce max_block_size.
  • Duplicate rows after a dual-write or catch-up phase. Overlapping windows re-insert the same rows. Use ReplacingMergeTree on the destination, or make catch-up filters strictly exclusive on a monotonic column.
  • ATTACH PARTITION fails after rsync. The destination table definition differs from the source, or parts were copied into the wrong directory. Part-moving requires byte-identical schemas; fall back to INSERT … SELECT if the schema must change.
  • Re-sharding produces uneven shards. The sharding key's distribution is skewed. Verify your sharding expression (commonly cityHash64(key)) and check per-shard row counts after the copy.
  • Replica fails to fetch parts when adding a node. See too many retries to fetch parts and replication/distributed connection errors.

How Pulse Helps

Data migrations are where small mistakes become expensive: a skipped partition, a duplicated catch-up window, or a destination cluster that is silently under-replicated. Pulse monitors both sides of a migration — replication lag, part counts per partition, merge backlog, and Keeper health — so you can confirm the destination has actually converged before you cut over. It flags the diagnostic signals that usually surface only after switchover (stuck merges, growing replication queues, mismatched part counts), and the team behind it has run these migrations on production ClickHouse clusters, so you get a second set of eyes on the cutover plan rather than just a dashboard.

Frequently Asked Questions

Q: What is the simplest way to copy one table to another ClickHouse server?

For a one-off move, run INSERT INTO target SELECT * FROM remoteSecure('host:9440', 'db.table', 'user', 'password'). It needs no shared coordination and lets you transform or re-partition data on the way through. For identical schemas where you want minimal CPU cost, use BACKUP/RESTORE or FREEZE + rsync + ATTACH PARTITION instead.

Q: Should I still use clickhouse-copier?

No. clickhouse-copier was moved out of the main ClickHouse distribution in version 24.2 and is no longer maintained. Use INSERT … SELECT over the remote/Distributed tables, or an external job that chunks the copy by partition, for cross-cluster and re-sharding migrations.

Q: How do I migrate without downtime?

Backfill historical data first while the source stays live, then either dual-write new inserts to both clusters or run an incremental catch-up copy filtered on a monotonic timestamp. Verify per-partition counts and hashes, cut over reads, then cut over writes. A deduplicating engine like ReplacingMergeTree on the destination absorbs any overlap from the catch-up window.

Q: Can I change the schema or partitioning during a migration?

Yes — but only with a rewrite method. INSERT … SELECT (via remote or clickhouse-local) decodes and re-encodes every row, so you can change types, drop columns, or re-partition freely. Part-moving methods (ATTACH PARTITION, FETCH PARTITION) require byte-identical schemas because they adopt existing data parts unchanged.

Q: How do I verify the migration copied everything?

Compare row counts and a content hash per partition between source and destination, e.g. SELECT count(), sum(cityHash64(*)) FROM tbl WHERE toYYYYMM(event_date) = 202601. Row counts alone can hide silent corruption, so always include a hash and check each partition individually.

Q: How do I move data into ClickHouse Cloud from a self-hosted instance?

Push from the self-hosted side with INSERT INTO FUNCTION remoteSecure('your-instance.clickhouse.cloud:9440', 'db.table', 'user', 'password') SELECT * FROM local_db.table. This works well because outbound connectivity from your own server to Cloud is usually simpler than the reverse, and it lets you migrate table by table.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.