ClickHouse Executable Dictionary as a Cron-Style Scheduler

Q: How precise is the schedule?

The script runs at most once per . Actual timing depends on when the dictionary was first loaded and how long the script takes. Treat the schedule as approximate, not real-time.

Q: Can I use this to schedule SQL-only work?

Yes. Have the script execute one or more clickhouse-client --query commands. Anything you can do over the client protocol is available.

ClickHouse does not ship a built-in cron scheduler, but it does have a mechanism that runs an external command on a regular interval: executable dictionaries. A dictionary configured with a small lifetime reloads itself at that interval, and reload means running the source command. With a bit of logic in the command itself, you can implement scheduled tasks that look and feel like cron jobs but live inside the ClickHouse configuration tree.

This pattern is useful for low-frequency, self-contained operational tasks (housekeeping queries, snapshot uploads, conditional restarts) where standing up a separate scheduler is overkill.

For background on executable dictionaries in general, see what is a ClickHouse executable dictionary.

How the Trigger Works

A dictionary's <lifetime> setting tells ClickHouse how often to refresh the dictionary data. For an executable source, refresh means running the configured command and reading its stdout. If the script's stdout is constant (or empty in a way the dictionary tolerates), the dictionary value never changes but the script still executes on every refresh.

That side-effect-driven design is the whole pattern. The dictionary's actual data is incidental; the script's work is what matters.

Complete Example: Weekly Scheduled Restart

Goal: restart the ClickHouse server on Saturdays between 10:00 and 11:00, at most once per hour. The script encodes the time window logic; the dictionary just polls.

1. Log table

Track when the restart has happened so the script can rate-limit itself:

CREATE TABLE restart_table
(
    restart_datetime DateTime
)
ENGINE = TinyLog;

2. Dictionary configuration

Put this in /etc/clickhouse-server/config.d/restart_dict.xml:

<clickhouse>
    <dictionaries_config>/etc/clickhouse-server/config.d/*_dict.xml</dictionaries_config>
    <dictionaries_lazy_load>false</dictionaries_lazy_load>

    <dictionary>
        <name>restart_dict</name>
        <structure>
            <id>
                <name>restart_id</name>
                <type>UInt64</type>
            </id>
        </structure>
        <source>
            <executable>
                <command>restart_dict.sh</command>
                <execute_direct>true</execute_direct>
                <format>CSV</format>
            </executable>
        </source>
        <layout>
            <flat/>
        </layout>
        <lifetime>300</lifetime>
    </dictionary>
</clickhouse>

Key elements:

<lifetime>300</lifetime>: refresh every 5 minutes (300 seconds). Pick the granularity your task needs; finer intervals run the script more often.
<dictionaries_lazy_load>false</dictionaries_lazy_load>: load (and start polling) immediately at server startup.
<execute_direct>true</execute_direct>: execute the binary directly without going through a shell. The script must live in the user_scripts directory.
<format>CSV</format>: tells ClickHouse how to parse the script's stdout.

3. The script

The script lives in the user_scripts directory (typically /var/lib/clickhouse/user_scripts/) and is chmod +x:

#!/bin/bash

CLICKHOUSE_USER="admin"
CLICKHOUSE_PASSWORD="xxxxxxxxx"

current_day=$(date +%u)     # 1 (Mon) ... 7 (Sun)
current_time=$(date +%H%M)  # HHMM

if [[ $current_day -eq 6 && $current_time -ge 1000 && $current_time -lt 1100 ]]; then
    current_timestamp=$(date +%s)

    last_restart_timestamp=$(clickhouse-client \
        --user "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" \
        --query "SELECT max(toUnixTimestamp(restart_datetime)) FROM restart_table")

    if [[ $(( current_timestamp - last_restart_timestamp )) -ge 3600 ]]; then
        echo "$current_timestamp" | clickhouse-client \
            --user "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" \
            --query "INSERT INTO restart_table FORMAT TSVRaw"

        clickhouse-client \
            --user "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" \
            --query "SYSTEM SHUTDOWN"
    fi
fi

Two layers of guarding:

Window check: only act between Saturday 10:00 and 11:00.
Cooldown check: only act if more than an hour has elapsed since the last logged restart.

Without the cooldown, the script would shut the server down every refresh inside the window. With it, the server restarts at most once and then the logged timestamp prevents re-entry until the next window.

4. What ClickHouse expects from stdout

The dictionary still needs valid CSV output. A safe minimal output is empty (no rows). For the example above, the script prints nothing to stdout (the echo and clickhouse-client INSERT are pipelined; their output does not reach the dictionary). The dictionary sees zero rows, the flat layout accepts that, and ClickHouse waits for the next refresh.

If you want to track invocations through the dictionary itself, have the script emit a CSV row (e.g. 1,"ok") and define a value String attribute in the dictionary structure.

Using executable_pool for Frequent Refreshes

If your <lifetime> is short (seconds), spawning a new process on every refresh is wasteful. Switch to executable_pool:

<source>
    <executable_pool>
        <command>restart_dict.sh</command>
        <execute_direct>true</execute_direct>
        <format>CSV</format>
        <pool_size>1</pool_size>
        <max_command_execution_time>30</max_command_execution_time>
    </executable_pool>
</source>

ClickHouse keeps a pool of script processes alive and reuses them across refreshes. For a cron-style use case where the script may not handle being held open, stick with executable and use a moderate lifetime (60 seconds or more). See ClickHouse executable pool for the design tradeoffs.

Other Cron-Like Use Cases

Once the pattern is set up, it generalizes:

Daily backup uploads. Check day and hour; trigger BACKUP if not yet uploaded today.
Periodic OPTIMIZE on specific tables. Run a script that issues OPTIMIZE TABLE ... PARTITION against partitions matching some staleness criterion.
External health probes. Push a heartbeat to an external service from inside the cluster.
Cleanup of stale detached/ parts. Walk system.detached_parts and ALTER TABLE ... DROP DETACHED PART for entries older than a threshold.

Anything that fits "every N minutes, decide whether to act, optionally act" works.

Common Pitfalls

Script not in the user_scripts directory. With execute_direct=true, ClickHouse looks in the user_scripts directory only. Putting the script elsewhere produces "file not found" errors at refresh time.
No timeout protection. A hung script blocks dictionary refresh. Set <command_termination_timeout> and design the script to exit quickly on the no-op path.
Credentials in plain text. Embedding clickhouse-client --password in a script is fine for a lab but not for production. Use a config file or an environment variable injected by your deployment system.
Long lifetime + short window. If the window is 60 minutes and the lifetime is 60 minutes, you may miss the window depending on the refresh phase. Make the lifetime substantially shorter than the smallest window you want to react to.
The script's stdout breaking the dictionary parse. Anything clickhouse-client writes to stdout (warnings, progress, query results) leaks into the dictionary's CSV stream. Redirect those streams away in the script: clickhouse-client ... >/dev/null 2>&1 for commands whose output you do not want.
Server can't reach itself. The script invokes clickhouse-client which connects back to the local server. If networking or authentication is misconfigured, the script silently fails and the scheduled action never happens. Test the script standalone first.

How Pulse Helps Operate ClickHouse Dictionaries

Executable dictionaries are powerful and easy to misuse. A misbehaving script can hammer the server, exhaust file descriptors, or silently stop doing the thing it was supposed to do. Pulse tracks dictionary refresh failures, execution time outliers, and stale last_successful_update_time across every dictionary in your cluster. Connect your ClickHouse cluster to Pulse and stop discovering broken cron-dictionaries by accident.

Frequently Asked Questions

Q: Is this a supported pattern or a hack?

The executable dictionary feature is supported. Using it as a scheduler is a side effect of its refresh semantics. It works reliably as long as the script is well-behaved. For complex scheduling needs, an external scheduler (cron, systemd timers, Airflow) is still the better answer.

Q: How precise is the schedule?

The script runs at most once per <lifetime>. Actual timing depends on when the dictionary was first loaded and how long the script takes. Treat the schedule as approximate, not real-time.

Q: Can I have multiple cron-dictionaries?

Yes. Define one dictionary per task, each with its own script and lifetime. ClickHouse runs them independently.

Q: Does the dictionary need to be queried for the script to run?

No. The refresh happens on the configured lifetime regardless of whether anyone is querying the dictionary. dictionaries_lazy_load=false ensures it starts polling immediately at server startup.

Q: What happens if the script fails?

The dictionary refresh fails for that interval. ClickHouse logs the error and tries again on the next refresh. Your task simply does not run that cycle. Monitor system.dictionaries for last_exception.

Q: Can I use this to schedule SQL-only work?

Yes. Have the script execute one or more clickhouse-client --query commands. Anything you can do over the client protocol is available.