remote() Table Function: Cross-Server Queries and Data Movement

Q: What is the difference between `remote()` and `remoteSecure()`?

They are the same function with different transport. remote() uses the plain native TCP protocol on default port 9000; remoteSecure() uses the native protocol over TLS on default port 9440. Use remoteSecure() whenever credentials or data travel over an untrusted network, and make sure the port matches the function.

Q: When should I use `remote()` instead of a `Distributed` table?

Use remote() for one-off or infrequent operations — ad-hoc cross-cluster queries, debugging a remote node, or a bounded data move — especially when the two servers don't share a remote_servers configuration. For recurring production read/write fan-out, a Distributed table is better because it pools long-lived connections instead of reconnecting per request.

Q: Why do I get "All connection tries failed"?

Most often you pointed remote() at a secure port (9440) — switch to remoteSecure() — or the link is slower than the failover connect timeout. Raise connect_timeout_with_failover_ms / connect_timeout_with_failover_secure_ms for cross-region connections. If the address is a hostname, confirm it resolves on the initiating server; see Cannot Resolve Host in Replication .

Q: Can `remote()` query multiple independent clusters at once?

Yes. Because addresses are explicit in the query, remote() / remoteSecure() can reach servers that share no common configuration. A common pattern is to create local table references with AS remoteSecure(...) for each cluster and combine them with a Merge table engine to query all of them as one. When the clusters are configured in remote_servers , use cluster('{cluster}', db.table) with macro substitution instead.

The remote() table function lets you query a table on any other ClickHouse server using an ad-hoc connection — no cluster configuration, no Distributed table DDL, no entry in remote_servers. You give it an address, a database and table, and credentials, and ClickHouse opens a connection on the fly. Its TLS sibling, remoteSecure(), does the same over an encrypted connection.

This makes remote() ideal for one-off jobs: federating a quick query across two clusters, debugging a remote node, or moving a bounded amount of data between servers. It is not a substitute for a properly configured Distributed table in steady-state production, because it re-establishes the connection on every call.

Syntax

The function accepts a few overlapping forms:

remote(addresses_expr, [db, table, user [, password], sharding_key])
remote(addresses_expr, [db.table, user [, password], sharding_key])
remote(named_collection[, option=value [, ...]])

remoteSecure() uses the identical signatures but connects with TLS.

Parameter	Meaning
`addresses_expr`	One or more server addresses as `host` or `host:port`. IPv4, bracketed IPv6, and hostnames are all accepted.
`db` / `table`	Database and table to read (or write). May be passed as `db.table`.
`user`	Username on the remote server. Defaults to `default`.
`password`	Password. Defaults to empty.
`sharding_key`	Optional `UInt32` expression used to distribute rows when writing across multiple shard addresses.

Default ports differ by function: remote() uses 9000 (native TCP protocol) and remoteSecure() uses 9440 (native protocol with TLS). If you supply only addresses_expr and nothing else, the function reads from system.one on the remote — useful for a connectivity smoke test.

A minimal read looks like this:

SELECT count()
FROM remote('10.0.0.7:9000', 'analytics', 'events', 'default', 'secret');

Over TLS, against ClickHouse Cloud or any node with secure TCP enabled:

SELECT count()
FROM remoteSecure('host.region.clickhouse.cloud:9440', 'analytics.events', 'default', 'Password123!');

Specifying Multiple Addresses

The addresses_expr argument is more than a single host. The separator you use changes the semantics:

Comma means shards — the query is spread across the listed addresses and results are merged:
```
SELECT count()
FROM remote('shard1:9000,shard2:9000,shard3:9000', 'db', 'events');
```
Pipe (|) means replicas — ClickHouse picks the first healthy address for that shard:
```
SELECT count()
FROM remote('shard1-a:9000|shard1-b:9000', 'db', 'events');
```

Glob-style patterns expand into multiple addresses, which keeps long host lists compact:

Pattern	Expands to
`{a,b,c}`	the literal alternatives `a`, `b`, `c`
`{1..3}`	`1`, `2`, `3` (numeric range)
`{01..03}`	`01`, `02`, `03` (leading zeros preserved)
`{a\|b}`	replica alternatives `a` or `b`

-- Three shards, named chnode1..chnode3, each on port 9000
SELECT count()
FROM remote('chnode{1..3}:9000', 'db', 'events');

remote() vs. cluster() vs. Distributed Tables

remote() is one of three ways to run a query across servers, and choosing the wrong one is a common source of pain. The key distinction is where the topology lives.

Approach	Topology source	Connection lifecycle	Best for
`remote()` / `remoteSecure()`	Explicit address string in the query	Re-established per request	One-off queries, ad-hoc debugging, bounded data moves
`cluster()` / `clusterAllReplicas()`	Named cluster from `remote_servers` config	Per request, but reuses configured hosts/creds	Ad-hoc queries against an already-configured cluster
`Distributed` table engine	Named cluster from `remote_servers` config	Pooled, long-lived connections	Steady-state production read/write fan-out

cluster('cluster_name', db.table) accesses all shards configured in the remote_servers section without you hand-typing addresses, and it supports macro substitution so the same query works on every node:

SELECT count()
FROM cluster('{cluster}', default.events);

Here {cluster} is replaced from each server's <macros> configuration. clusterAllReplicas() is the same but contacts every replica, not just one per shard.

Rule of thumb: if the cluster is defined in config, prefer cluster() or a Distributed table. Reach for remote() precisely when there is no shared configuration linking the two servers — for example, querying across two independent clusters, or against a server you do not control the config of. For the underlying config that cluster() and Distributed rely on, see Production Cluster Configuration and the ClickHouse Architecture Guide.

Moving Data with remote()

remote() can appear on either side of an INSERT ... SELECT, which makes it a practical tool for migrations and backfills.

Pull data into the local server from a remote table:

INSERT INTO local_events
SELECT * FROM remote('source-host:9000', 'analytics', 'events', 'default', 'secret')
WHERE event_date = '2026-05-01';

Push data out to a remote server with INSERT INTO FUNCTION:

INSERT INTO FUNCTION remoteSecure('dest-host:9440', 'analytics.events', 'default', 'secret')
SELECT * FROM local_events
WHERE event_date = '2026-05-01';

Slice large transfers

The remote table function is well suited to moving up to a few hundred gigabytes. For anything larger, do not stream the whole table in one statement. Slice it by a WHERE condition — ideally on the partitioning key — so each insert lands in a small set of partitions instead of writing to many partitions at once:

INSERT INTO staging SELECT * FROM remote(...) WHERE date = '2026-04-13';
INSERT INTO staging SELECT * FROM remote(...) WHERE date = '2026-04-12';
INSERT INTO staging SELECT * FROM remote(...) WHERE date = '2026-04-11';

This keeps memory bounded, makes failures resumable (you know exactly which slice to retry), and avoids the "too many parts" pressure that a single monolithic insert can create. See Too Many Parts / Merge Backlog if you hit that during a bulk load.

Tune throughput vs. memory

For large moves you can trade memory for speed by raising parallelism and block sizes on the receiving side:

SET max_insert_threads = 20,
    max_threads = 20,
    min_insert_block_size_rows = 16777216,
    min_insert_block_size_bytes = 536870912;

Larger blocks accelerate the insert but increase peak memory, which scales roughly with max_insert_threads multiplied by the block-size limits (min_insert_block_size_rows / min_insert_block_size_bytes, defaults ~1,048,449 rows and ~268 MB). Raise these gradually and watch memory on the receiver.

Common Issues

All connection tries failed. The two usual causes:

You used remote() against a secure port (9440). The plain function speaks unencrypted native protocol and cannot complete a TLS handshake — switch to remoteSecure().
Latency to the remote exceeds the failover connect timeout. Both connect_timeout_with_failover_ms and connect_timeout_with_failover_secure_ms default to 1000 ms (raised from tighter legacy values in ClickHouse PR #49009). Even so, cross-region or public-internet links can exceed this; raise them if needed:
```
SELECT count()
FROM remoteSecure('far-host:9440', 'db', 'events')
SETTINGS connect_timeout_with_failover_secure_ms = 1000;
```

Hostname cannot be resolved. If the remote address is a hostname rather than an IP, DNS must resolve on the initiating server. This is the same class of failure described in Cannot Resolve Host in Replication — the fix (working DNS or an explicit IP) is identical.

Connection overhead on hot paths. Because remote() and remoteSecure() reconnect for every request, a dashboard or service hammering them will pay TCP/TLS setup cost repeatedly. If a query pattern is recurring, promote it to a Distributed table or cluster() call so connections are pooled.

Credentials in plaintext. Passwords passed inline appear in system.query_log and client history. Prefer a named collection to keep secrets out of the SQL text, and always use remoteSecure() when credentials cross a network.

Best Practices

Use remoteSecure() whenever credentials leave the host. Plaintext native protocol exposes both data and password on the wire.
Prefer named collections over inline passwords so secrets stay out of query logs and shell history.
Don't use remote() for steady-state fan-out. It is an ad-hoc tool; recurring cross-server reads belong in a Distributed table.
Slice big migrations by partition key and load one slice at a time for bounded memory and resumability.
Match the function to the port. remote() → 9000, remoteSecure() → 9440. Mismatches surface as connection failures, not clear errors.
Raise failover timeouts for cross-region links. The 1000 ms defaults are reasonable for a LAN but can still be too short for cross-region or public-internet connections.

How Pulse Helps

Cross-server remote() jobs tend to fail in unglamorous ways: a backfill that quietly stalls because the receiver is under merge pressure, a transfer that exhausts memory from over-aggressive parallelism, or repeated All connection tries failed errors that turn out to be a port or timeout mismatch. Pulse monitors ClickHouse clusters and surfaces exactly these conditions — connection and timeout failures, part-count and merge backlogs building up during bulk loads, and memory pressure on insert paths — with concrete remediation guidance rather than raw metrics. When you are moving data between servers or federating queries across clusters, Pulse helps you see whether the bottleneck is the network, the receiver, or your settings, so a one-off migration doesn't turn into an outage.

Frequently Asked Questions

Q: What is the difference between remote() and remoteSecure()?

They are the same function with different transport. remote() uses the plain native TCP protocol on default port 9000; remoteSecure() uses the native protocol over TLS on default port 9440. Use remoteSecure() whenever credentials or data travel over an untrusted network, and make sure the port matches the function.

Q: When should I use remote() instead of a Distributed table?

Use remote() for one-off or infrequent operations — ad-hoc cross-cluster queries, debugging a remote node, or a bounded data move — especially when the two servers don't share a remote_servers configuration. For recurring production read/write fan-out, a Distributed table is better because it pools long-lived connections instead of reconnecting per request.

Q: How do I specify shards versus replicas in the address string?

Separate shard addresses with commas (the query fans out and merges across them) and separate replica addresses with a pipe | (ClickHouse uses the first healthy one). Glob patterns like chnode{1..3}:9000 and {a|b} expand to keep long lists compact.

Q: How much data can I move through remote()?

It comfortably handles up to a few hundred gigabytes. For larger volumes, slice the source by a WHERE condition on the partitioning key and load one slice per INSERT ... SELECT, which keeps memory bounded and makes the job resumable. Tune max_insert_threads and the min_insert_block_size_* settings to trade memory for throughput.

Q: Why do I get "All connection tries failed"?

Most often you pointed remote() at a secure port (9440) — switch to remoteSecure() — or the link is slower than the failover connect timeout. Raise connect_timeout_with_failover_ms / connect_timeout_with_failover_secure_ms for cross-region connections. If the address is a hostname, confirm it resolves on the initiating server; see Cannot Resolve Host in Replication.

Q: Can remote() query multiple independent clusters at once?

Yes. Because addresses are explicit in the query, remote()/remoteSecure() can reach servers that share no common configuration. A common pattern is to create local table references with AS remoteSecure(...) for each cluster and combine them with a Merge table engine to query all of them as one. When the clusters are configured in remote_servers, use cluster('{cluster}', db.table) with macro substitution instead.