ClickHouse Copier: Obsolete Migration Tool and Modern Alternatives

What is clickhouse-copier?

clickhouse-copier is a legacy utility that copied data between ClickHouse clusters or reshards data within a cluster, coordinating workers through ZooKeeper. It read an XML task description, fanned out per-partition copy tasks across worker processes, and committed progress checkpoints so it could resume after interruption. The tool is now obsolete: it was moved out of the main ClickHouse repository into a separate, unmaintained repository at github.com/ClickHouse/copier, which the upstream README labels "no longer supported." New migrations should use one of the modern alternatives below.

Why clickhouse-copier Was Deprecated

The ClickHouse team flagged clickhouse-copier as fragile and hard to maintain, and the tool was removed from the official ClickHouse bundle in version 24.2 (the change is tracked in issue #60734). The deprecation reflected three real problems: the XML task language was error-prone, the ZooKeeper coordination layer broke under cluster contention, and the project did not keep pace with newer table engines (S3, lazy materialization, refreshable materialized views). For any cluster running 24.x or later, copier binaries are not shipped by default.

The community still occasionally compiles copier from the archived repo for one-off resharding jobs against very old clusters, but it should be treated as legacy code: no bug fixes, no compatibility guarantees with current server versions, and no support for new engine features.

Modern Alternatives to clickhouse-copier

Choose a replacement based on the migration shape - cross-cluster copy, full backup/restore, or resharding:

Scenario Recommended tool Notes
Copy a few tables between two clusters INSERT INTO ... SELECT FROM remote() / remoteSecure() Simple SQL; no resume on failure. Best for small tables.
Full cluster backup and restore Native BACKUP / RESTORE to S3 or disk Built into the server since 22.8. Supports databases, tables, and partial restores.
Production-grade backup/restore with retention clickhouse-backup (Altinity) OS-level file copy with metadata; handles passwords and access entities.
Reshard a Replicated table family INSERT SELECT remote() plus ALTER TABLE FREEZE PARTITION per partition Manual orchestration; pair with Pulse for progress tracking.
Migrate from on-prem to ClickHouse Cloud ClickHouse Cloud's clickhouse-copier replacement (ClickPipes for streaming, MIGRATION jobs) Managed service path.
Read-only access to another cluster's data remote() / remoteSecure() table functions inline in queries No copy; query passes through.

For a typical small-to-medium cross-cluster table copy:

-- Copy a table from source cluster to local cluster, deduplicated on PK
INSERT INTO target_db.events
SELECT * FROM remoteSecure(
    'source-host:9440',
    'source_db.events',
    'migration_user',
    'PASSWORD_HERE'
);

For partition-by-partition transfers of large tables, wrap the INSERT SELECT in a loop driven by system.parts on the source, and use INSERT INTO ... SETTINGS insert_deduplicate=1 so a retry does not double-count.

Common Pitfalls When Replacing Copier

  1. Treating INSERT SELECT remote() as resumable - it is not. If the network drops, the partially inserted block is rolled back, but partitions already committed remain. Track progress yourself in a control table or use partition-level loops.
  2. Forgetting that BACKUP / RESTORE is async by default. Poll system.backups for status before assuming success.
  3. Skipping access-entity export when migrating with BACKUP. Use BACKUP ALL (22.8+) or export users and grants separately via SHOW CREATE USER / SHOW GRANTS.
  4. Sharding key mismatches between source and destination distributed tables, which silently route rows to the wrong shard. Verify with SELECT _shard_num, count() FROM dist_table GROUP BY 1.
  5. Re-running an INSERT SELECT without insert_deduplicate=1 and without idempotent partition windows, causing duplicate rows.

Monitoring Migrations

A long-running migration with INSERT SELECT remote() or BACKUP is a good moment to watch part counts, memory usage, and merge backlogs on both source and destination - if either side falls behind on merges you can hit "Too many parts" errors. Track:

  • system.parts row count per table on the target
  • system.merges to see active background merges
  • system.backups for BACKUP / RESTORE job status
  • system.metrics for BackgroundMergesAndMutationsPoolTask, MemoryTracking

Pulse connects to both source and destination clusters and automatically detects part backlogs, replication lag, and memory pressure during migrations; its agentic SRE engine surfaces the root cause when a migration stalls (slow disk on target, throttled merges, ZooKeeper session loss) and can suggest or apply remediation, replacing a lot of the manual log diving that copier users were used to.

Frequently Asked Questions

Q: Is clickhouse-copier still maintained?
A: No. clickhouse-copier was moved to a separate repository at github.com/ClickHouse/copier and marked obsolete; the upstream README states "this tool is no longer supported." It was removed from the main ClickHouse bundle in version 24.2.

Q: What should I use instead of clickhouse-copier for migrating data between ClickHouse clusters?
A: For small tables, INSERT INTO target SELECT * FROM remoteSecure('host', 'db.table', 'user', 'pw'). For full clusters, native BACKUP ... TO S3('...') and RESTORE (available since 22.8). For production backup/restore with retention policies, the Altinity clickhouse-backup tool.

Q: Does BACKUP / RESTORE handle replicated tables?
A: Yes. Native BACKUP captures both data and metadata for ReplicatedMergeTree tables. On RESTORE to a new cluster, you typically restore on one replica and let ClickHouse Keeper coordinate replication to the rest, or restore the same backup independently on each replica with ALLOW_NON_EMPTY_TABLES.

Q: Can I reshard data without clickhouse-copier?
A: Yes. Use a per-partition INSERT INTO target_distributed SELECT * FROM source_table with the new sharding key, or ALTER TABLE FREEZE PARTITION followed by ATTACH PARTITION for direct file moves. Resharding is more operationally involved than a same-shape cluster copy, so plan idempotent steps and verify row counts per partition.

Q: What is the version where clickhouse-copier was removed?
A: It was removed from the main ClickHouse server bundle in 24.2. Earlier 23.x versions shipped a still-functional but deprecated binary. The standalone repository at github.com/ClickHouse/copier remains available for compilation if needed.

Q: How do I make INSERT SELECT remote() resumable like clickhouse-copier was?
A: Drive it from a control loop: iterate over partitions (e.g., SELECT DISTINCT toYYYYMM(event_date) FROM source), copy one partition per iteration with INSERT INTO target SELECT ... WHERE toYYYYMM(event_date) = :p SETTINGS insert_deduplicate=1, and record successful partitions in a tracking table. On retry, skip already-copied partitions.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.