The "DB::Exception: DNS resolution failed" error in ClickHouse occurs when the server cannot resolve a hostname to an IP address. The DNS_ERROR error code surfaces in several contexts: connecting to cluster nodes defined by hostname, accessing remote tables, resolving dictionary source addresses, or connecting to ZooKeeper/ClickHouse Keeper. DNS resolution is a prerequisite for any network communication that uses hostnames rather than IP addresses.
Impact
DNS failures affect ClickHouse operations broadly:
- Distributed queries across shards will fail if shard hostnames cannot be resolved
- Replication stops if the ZooKeeper or ClickHouse Keeper hostname is unresolvable
- Remote table functions and dictionaries sourced from external systems will fail
- New connections from clients using hostnames will be rejected
- Cluster health checks may report nodes as down even when they are running
Common Causes
- DNS server is unreachable or unresponsive
- The hostname is misspelled in the ClickHouse configuration
- DNS records for the target host have been deleted or not yet propagated
/etc/resolv.confis misconfigured or points to a dead resolver- Network partition isolating the ClickHouse server from the DNS infrastructure
- DNS cache poisoning or corruption returning incorrect results
- Container or pod DNS configuration issues in Kubernetes environments
- Firewall blocking UDP/TCP port 53 to the DNS server
Troubleshooting and Resolution Steps
Test DNS resolution from the ClickHouse server:
nslookup problematic-hostname dig problematic-hostname host problematic-hostnameIf all three fail, the problem is with DNS, not ClickHouse.
Check the DNS configuration:
cat /etc/resolv.confVerify that nameserver entries point to valid, reachable DNS servers:
ping -c 2 $(awk '/^nameserver/{print $2; exit}' /etc/resolv.conf)Verify the hostname is correct in ClickHouse config:
grep -r "problematic-hostname" /etc/clickhouse-server/Fix any typos in cluster definitions, remote server configurations, or dictionary sources.
Check if the DNS record exists:
dig problematic-hostname @8.8.8.8Using a public DNS server helps determine if the record exists globally or only in your private DNS.
Flush the ClickHouse DNS cache:
SYSTEM DROP DNS CACHE;ClickHouse caches DNS results internally. If a record recently changed, the cache may hold stale data.
For Kubernetes environments, verify CoreDNS/kube-dns is running:
kubectl get pods -n kube-system | grep dns kubectl logs -n kube-system <dns-pod-name>Check that the ClickHouse pod's DNS policy allows it to resolve both cluster-internal and external names.
Use IP addresses as a workaround while debugging DNS issues: Replace hostnames with IP addresses in cluster configuration temporarily to restore functionality:
<shard> <replica> <host>10.0.1.5</host> <!-- Use IP instead of hostname --> <port>9000</port> </replica> </shard>
Best Practices
- Use stable, reliable DNS infrastructure with redundant nameservers
- Consider using IP addresses in cluster configurations for critical infrastructure to eliminate DNS as a failure point
- Set
dns_cache_update_periodin ClickHouse config to control how often cached DNS entries are refreshed - In Kubernetes, ensure DNS policies are correctly set and CoreDNS has sufficient resources
- Monitor DNS resolution latency and failures as part of your infrastructure monitoring
- Maintain
/etc/hostsentries as a fallback for critical cluster nodes - Test DNS resolution during infrastructure changes before they reach production
Frequently Asked Questions
Q: Does ClickHouse cache DNS lookups?
A: Yes. ClickHouse maintains an internal DNS cache to avoid repeated lookups. You can clear it with SYSTEM DROP DNS CACHE and control the refresh interval with the dns_cache_update_period setting.
Q: Can I use /etc/hosts instead of DNS?
A: Yes. Entries in /etc/hosts are resolved locally without DNS. This can serve as a reliable fallback for cluster nodes, though it requires manual updates across all servers when IP addresses change.
Q: Why does this error appear intermittently?
A: Intermittent DNS failures often point to an overloaded DNS server, network packet loss, or DNS round-robin returning unreachable IPs. Check DNS server health and network stability.
Q: Does this error affect ClickHouse Keeper connections?
A: Yes. If ClickHouse Keeper or ZooKeeper hostnames cannot be resolved, replicated tables lose their coordination layer. Replication, DDL operations, and distributed DDL will all be affected until DNS is restored.