Using MySQL as a Dictionary Source in ClickHouse

ClickHouse dictionaries can pull reference data directly from a MySQL database, giving you fast in-memory lookups that stay in sync with an operational system. The most common stumbling block is not the dictionary syntax itself but MySQL 8's default authentication plugin, which ClickHouse's bundled MySQL client cannot always negotiate.

This guide covers the practical setup: configuring a compatible MySQL user, writing the dictionary definition, choosing a refresh strategy, and avoiding the authentication and schema-cache pitfalls that break MySQL-sourced dictionaries. For background on dictionaries in general, see What is a ClickHouse Dictionary and ClickHouse External Dictionary.

The MySQL 8 Authentication Change

MySQL 5.7 used mysql_native_password as its default authentication plugin. Starting with MySQL 8.0, the default changed to caching_sha2_password. This is the single most common reason a MySQL dictionary source fails to load: the MySQL client library ClickHouse uses does not negotiate caching_sha2_password over a non-TLS connection the way MySQL 8 expects.

The symptom is a dictionary that refuses to load, typically with an authentication or "plugin not loaded" style error referencing the source. The fix is to give the dictionary a dedicated MySQL user that uses mysql_native_password, rather than trying to make the application's default user work.

Creating a Compatible MySQL User

Create a dedicated, least-privilege MySQL user for ClickHouse and pin it to mysql_native_password:

MySQL version caveat: mysql_native_password was deprecated in MySQL 8.0.34, disabled by default in MySQL 8.4 (it must be re-enabled with --mysql-native-password=ON), and removed entirely in MySQL 9.0. On MySQL 8.4+ or 9.0+, you must use TLS with caching_sha2_password instead (see the section below).

-- Run this in MySQL 8.0 (8.0.0–8.0.33 without extra config; 8.0.34+ emits a deprecation warning)
CREATE USER IF NOT EXISTS 'clickhouse'@'%'
    IDENTIFIED WITH mysql_native_password BY 'clickhouse_user_password';

CREATE DATABASE IF NOT EXISTS dictionaries;

-- Grant only what the dictionary needs (read on the source table/db)
GRANT SELECT ON dictionaries.* TO 'clickhouse'@'%';

FLUSH PRIVILEGES;

If the user already exists with the wrong plugin, switch it instead of recreating it:

ALTER USER 'clickhouse'@'%'
    IDENTIFIED WITH mysql_native_password BY 'clickhouse_user_password';

Use a separate account for the dictionary even if you have an existing application user. It keeps the authentication plugin decoupled from your application's credentials and lets you grant read-only access to just the source tables.

When you must use caching_sha2_password

caching_sha2_password only works securely over an encrypted connection (or with a configured server public key). If your security policy forbids mysql_native_password, you must connect the dictionary over TLS. ClickHouse's MySQL dictionary source supports SSL, and the official docs note that when establishing an SSL connection, security is mandatory — the connection is encrypted end to end. Configuring TLS for the MySQL source is more involved than the native-password route, so reach for it only when policy requires it.

Defining the Dictionary

With a compatible user in place, define the dictionary. The DDL form is the most portable and works the same in self-hosted ClickHouse and ClickHouse Cloud:

CREATE DICTIONARY customers_dict
(
    id          UInt64,
    name        String,
    tier        String DEFAULT 'standard',
    updated_at  DateTime
)
PRIMARY KEY id
SOURCE(MYSQL(
    port 3306
    user 'clickhouse'
    password 'clickhouse_user_password'
    replica(host 'mysql-primary.internal' priority 1)
    replica(host 'mysql-replica.internal'  priority 2)
    db 'dictionaries'
    table 'customers'
))
LIFETIME(MIN 300 MAX 360)
LAYOUT(HASHED());

Key points:

  • replica(host ... priority ...) lets you list multiple MySQL hosts. ClickHouse tries lower-priority numbers first (priority 1 before priority 2), so point it at a replica when possible to keep load off the primary.
  • LAYOUT controls how the dictionary is stored in memory. HASHED() loads the whole table into a hash map; FLAT() is faster for small contiguous integer keys; CACHE() keeps only a hot subset in memory and fetches misses on demand. See Dictionaries vs LowCardinality for when a dictionary is the right tool.
  • LIFETIME controls the refresh interval (covered below).

Once created, query it with dictGet:

SELECT
    order_id,
    dictGet('customers_dict', 'tier', toUInt64(customer_id)) AS customer_tier
FROM orders;

Available Source Parameters

The MySQL dictionary source accepts these parameters inside SOURCE(MYSQL(...)):

Parameter Purpose
port MySQL port (commonly 3306). Can also be set per-replica.
user / password Credentials for the dedicated dictionary user.
host Single MySQL host (use replica(...) for multiple).
replica(host, priority) One or more hosts with failover priority; lower number tried first.
db / table Source database and table.
where Optional filter applied to the source query, e.g. where 'is_active = 1'.
query A custom SELECT to use instead of db/table/where. Cannot be combined with table/where.
invalidate_query A query whose result signals whether data changed (see below).
fail_on_connection_loss If true, throw immediately on connection loss; if false (the default), retry the query three times before failing.

A minimal definition with an explicit query and a filter:

SOURCE(MYSQL(
    port 3306
    user 'clickhouse'
    password 'clickhouse_user_password'
    host 'mysql-replica.internal'
    db 'dictionaries'
    query 'SELECT id, name, tier, updated_at FROM customers WHERE is_active = 1'
))

Refreshing the Dictionary Efficiently

LIFETIME(MIN m MAX n) tells ClickHouse to refresh the dictionary at a random interval between m and n seconds. The randomization spreads reload load across many dictionaries rather than refreshing them all at once. A fixed LIFETIME(300) is also valid.

By default, every refresh re-reads the entire source table. On a large or busy MySQL table this is wasteful. Use invalidate_query to skip refreshes when nothing changed:

CREATE DICTIONARY customers_dict
(
    id          UInt64,
    name        String,
    tier        String,
    updated_at  DateTime
)
PRIMARY KEY id
SOURCE(MYSQL(
    port 3306
    user 'clickhouse'
    password 'clickhouse_user_password'
    host 'mysql-replica.internal'
    db 'dictionaries'
    table 'customers'
    invalidate_query 'SELECT max(updated_at) FROM dictionaries.customers'
))
LIFETIME(MIN 300 MAX 360)
LAYOUT(HASHED());

ClickHouse runs the invalidate_query each cycle. If the single returned value is unchanged from the previous cycle, ClickHouse skips the full reload entirely. A max(updated_at) or a row count plus checksum are common choices. This pattern is essential for large lookup tables — without it, you pay a full table scan on MySQL every LIFETIME interval.

For incremental enrichment patterns built on dictionaries, see the dictionary enrichment update pattern.

Common Issues

Authentication / plugin errors on load. Almost always the MySQL 8 caching_sha2_password default. On MySQL 8.0 (up to 8.0.33), switch the dictionary user to mysql_native_password (see above). On MySQL 8.4, mysql_native_password is disabled by default and must be re-enabled with --mysql-native-password=ON; on MySQL 9.0 it is removed entirely. For those versions, connect over TLS instead.

Stale schema after MySQL DDL changes. MySQL 8 caches information_schema statistics, which can cause ClickHouse to see outdated metadata after the source table changes. Disable the cache on the MySQL side:

SET GLOBAL information_schema_stats_expiry = 0;

Or persist it in my.cnf:

[mysqld]
information_schema_stats_expiry = 0

Dictionary loads empty. Usually a where/query filter that matches no rows, a wrong db/table, or the dictionary user lacking SELECT on the source. See ClickHouse dictionary is empty.

Cache dictionary update failures. With LAYOUT(CACHE()), transient MySQL connection issues can surface as update failures rather than a clean retry. See cache dictionary update fail for diagnosis, and consider fail_on_connection_loss 'false' (the default) so ClickHouse retries before erroring.

Privilege errors. If ClickHouse reports access denied creating or reading the dictionary, that is a ClickHouse-side grant problem — see dictionary access denied. A MySQL-side access error instead points at the MySQL grants; check MySQL access denied for user.

Best Practices

  1. Use a dedicated, read-only MySQL user scoped to just the source database. Do not reuse the application user. Pin it to mysql_native_password on MySQL 8.0 (up to 8.0.33 without caveats; 8.0.34 deprecated it; 8.4 disables it by default; 9.0 removes it). On MySQL 8.4+ use TLS with caching_sha2_password instead.
  2. Point at a replica, not the primary. Dictionary reloads are full table reads; use replica(...) priorities to prefer a read replica.
  3. Always set an invalidate_query for tables of meaningful size, so refreshes skip when data is unchanged.
  4. Match the layout to the data. FLAT for small dense integer keys, HASHED for general key/value sets, CACHE only when the table is too large to fit in memory and access is sparse.
  5. Rotate the dictionary password independently of application credentials since it lives in the dictionary definition.
  6. Prefer the MySQL engine for tables you also want to query directly rather than just look up — see What is the ClickHouse MySQL engine.

How Pulse Helps

Pulse monitors ClickHouse dictionaries as part of overall cluster health: it surfaces dictionaries that fail to load, that are stuck on stale data because a MySQL source is unreachable, or that are reloading far more often than their invalidate_query should allow. Because a broken MySQL-sourced dictionary often shows up first as failing dictGet calls deep inside queries, having load status and refresh history tracked centrally turns a confusing query error into an obvious "the MySQL source changed its auth plugin" signal. Learn more at pulse.support.

Frequently Asked Questions

Q: Why does my MySQL 8 dictionary fail to authenticate when the same credentials work in the MySQL CLI?

MySQL 8 defaults to the caching_sha2_password plugin, which ClickHouse's MySQL client cannot reliably negotiate over a plain connection. Create the dictionary user with IDENTIFIED WITH mysql_native_password, or connect the source over TLS.

Q: Do I have to downgrade MySQL security to use mysql_native_password?

No — on MySQL versions up to 8.0.33 you can scope a single dedicated dictionary user to mysql_native_password while the rest of your MySQL users keep caching_sha2_password. Note that mysql_native_password was deprecated in MySQL 8.0.34, is disabled by default in MySQL 8.4, and was removed in MySQL 9.0. On MySQL 8.4 it can be re-enabled with --mysql-native-password=ON, but the preferred long-term path is to connect the dictionary source over TLS with caching_sha2_password.

Q: How do I avoid a full table scan on MySQL every refresh?

Add an invalidate_query such as SELECT max(updated_at) FROM .... ClickHouse only performs a full reload when that value changes between LIFETIME cycles, so unchanged data costs just one cheap query.

Q: Can I read from a MySQL replica with failover?

Yes. List multiple replica(host '...' priority N) entries. ClickHouse tries lower priority numbers first, so set your read replica to priority 1 and the primary to a higher number as a fallback.

Q: What does fail_on_connection_loss control?

If true, ClickHouse throws immediately when the connection to MySQL drops. If false — the default — it retries the query three times before throwing. Leaving it at the default makes dictionaries more resilient to transient MySQL hiccups.

Q: My dictionary sees an old MySQL schema after I altered the source table. Why?

MySQL 8 caches information_schema statistics. Set information_schema_stats_expiry = 0 (globally or in my.cnf) so ClickHouse sees current metadata after DDL changes on the source.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.