What is clickhouse-copier?
clickhouse-copier is a legacy utility that copied data between ClickHouse clusters or reshards data within a cluster, coordinating workers through ZooKeeper. It read an XML task description, fanned out per-partition copy tasks across worker processes, and committed progress checkpoints so it could resume after interruption. The tool is now obsolete: it was moved out of the main ClickHouse repository into a separate, unmaintained repository at github.com/ClickHouse/copier, which the upstream README labels "no longer supported." New migrations should use one of the modern alternatives below.
Why clickhouse-copier Was Deprecated
The ClickHouse team flagged clickhouse-copier as fragile and hard to maintain, and the tool was removed from the official ClickHouse bundle in version 24.2 (the change is tracked in issue #60734). The deprecation reflected three real problems: the XML task language was error-prone, the ZooKeeper coordination layer broke under cluster contention, and the project did not keep pace with newer table engines (S3, lazy materialization, refreshable materialized views). For any cluster running 24.x or later, copier binaries are not shipped by default.
The community still occasionally compiles copier from the archived repo for one-off resharding jobs against very old clusters, but it should be treated as legacy code: no bug fixes, no compatibility guarantees with current server versions, and no support for new engine features.
Modern Alternatives to clickhouse-copier
Choose a replacement based on the migration shape - cross-cluster copy, full backup/restore, or resharding:
| Scenario | Recommended tool | Notes |
|---|---|---|
| Copy a few tables between two clusters | INSERT INTO ... SELECT FROM remote() / remoteSecure() |
Simple SQL; no resume on failure. Best for small tables. |
| Full cluster backup and restore | Native BACKUP / RESTORE to S3 or disk |
Built into the server since 22.8. Supports databases, tables, and partial restores. |
| Production-grade backup/restore with retention | clickhouse-backup (Altinity) | OS-level file copy with metadata; handles passwords and access entities. |
| Reshard a Replicated table family | INSERT SELECT remote() plus ALTER TABLE FREEZE PARTITION per partition |
Manual orchestration; pair with Pulse for progress tracking. |
| Migrate from on-prem to ClickHouse Cloud | ClickHouse Cloud's clickhouse-copier replacement (ClickPipes for streaming, MIGRATION jobs) |
Managed service path. |
| Read-only access to another cluster's data | remote() / remoteSecure() table functions inline in queries |
No copy; query passes through. |
For a typical small-to-medium cross-cluster table copy:
-- Copy a table from source cluster to local cluster, deduplicated on PK
INSERT INTO target_db.events
SELECT * FROM remoteSecure(
'source-host:9440',
'source_db.events',
'migration_user',
'PASSWORD_HERE'
);
For partition-by-partition transfers of large tables, wrap the INSERT SELECT in a loop driven by system.parts on the source, and use INSERT INTO ... SETTINGS insert_deduplicate=1 so a retry does not double-count.
Common Pitfalls When Replacing Copier
- Treating
INSERT SELECT remote()as resumable - it is not. If the network drops, the partially inserted block is rolled back, but partitions already committed remain. Track progress yourself in a control table or use partition-level loops. - Forgetting that
BACKUP/RESTOREis async by default. Pollsystem.backupsfor status before assuming success. - Skipping access-entity export when migrating with
BACKUP. UseBACKUP ALL(22.8+) or export users and grants separately viaSHOW CREATE USER/SHOW GRANTS. - Sharding key mismatches between source and destination distributed tables, which silently route rows to the wrong shard. Verify with
SELECT _shard_num, count() FROM dist_table GROUP BY 1. - Re-running an
INSERT SELECTwithoutinsert_deduplicate=1and without idempotent partition windows, causing duplicate rows.
Monitoring Migrations
A long-running migration with INSERT SELECT remote() or BACKUP is a good moment to watch part counts, memory usage, and merge backlogs on both source and destination - if either side falls behind on merges you can hit "Too many parts" errors. Track:
system.partsrow count per table on the targetsystem.mergesto see active background mergessystem.backupsforBACKUP/RESTOREjob statussystem.metricsforBackgroundMergesAndMutationsPoolTask,MemoryTracking
Pulse connects to both source and destination clusters and automatically detects part backlogs, replication lag, and memory pressure during migrations; its agentic SRE engine surfaces the root cause when a migration stalls (slow disk on target, throttled merges, ZooKeeper session loss) and can suggest or apply remediation, replacing a lot of the manual log diving that copier users were used to.
Frequently Asked Questions
Q: Is clickhouse-copier still maintained?
A: No. clickhouse-copier was moved to a separate repository at github.com/ClickHouse/copier and marked obsolete; the upstream README states "this tool is no longer supported." It was removed from the main ClickHouse bundle in version 24.2.
Q: What should I use instead of clickhouse-copier for migrating data between ClickHouse clusters?
A: For small tables, INSERT INTO target SELECT * FROM remoteSecure('host', 'db.table', 'user', 'pw'). For full clusters, native BACKUP ... TO S3('...') and RESTORE (available since 22.8). For production backup/restore with retention policies, the Altinity clickhouse-backup tool.
Q: Does BACKUP / RESTORE handle replicated tables?
A: Yes. Native BACKUP captures both data and metadata for ReplicatedMergeTree tables. On RESTORE to a new cluster, you typically restore on one replica and let ClickHouse Keeper coordinate replication to the rest, or restore the same backup independently on each replica with ALLOW_NON_EMPTY_TABLES.
Q: Can I reshard data without clickhouse-copier?
A: Yes. Use a per-partition INSERT INTO target_distributed SELECT * FROM source_table with the new sharding key, or ALTER TABLE FREEZE PARTITION followed by ATTACH PARTITION for direct file moves. Resharding is more operationally involved than a same-shape cluster copy, so plan idempotent steps and verify row counts per partition.
Q: What is the version where clickhouse-copier was removed?
A: It was removed from the main ClickHouse server bundle in 24.2. Earlier 23.x versions shipped a still-functional but deprecated binary. The standalone repository at github.com/ClickHouse/copier remains available for compilation if needed.
Q: How do I make INSERT SELECT remote() resumable like clickhouse-copier was?
A: Drive it from a control loop: iterate over partitions (e.g., SELECT DISTINCT toYYYYMM(event_date) FROM source), copy one partition per iteration with INSERT INTO target SELECT ... WHERE toYYYYMM(event_date) = :p SETTINGS insert_deduplicate=1, and record successful partitions in a tracking table. On retry, skip already-copied partitions.
Related Reading
- ClickHouse Backup and Restore - native BACKUP/RESTORE syntax and patterns
- ClickHouse MergeTree Engine - underlying engine being copied
- ClickHouse ReplicatedMergeTree - replicated tables and their coordination
- ClickHouse Documentation Hub - index of all ClickHouse KB pages
- ClickHouse Client - native CLI used to drive INSERT SELECT migrations
- Memory Limit Exceeded - common failure mode during large migrations