Capabilities
Full reference for the agent's 26+ tools, 18 expert skills, plan-and-verify mode, and worked example questions.
The agent exposes a small set of powerful primitives (26 tools, plus 3 env-gated control tools). Anything without a dedicated tool is done by writing SQL with the query tool, guided by a skill recipe. You never call tools directly — the agent picks and chains them automatically.
Tools
| Category | What it covers | Tools |
|---|---|---|
| Schema & exploration | Databases, tables, columns, ad-hoc SQL | query, list_databases, list_tables, get_table_schema, explore_table_schema |
| Query analysis | Running, slow, failed queries; normalized slow patterns; EXPLAIN; pre-flight cost estimate | get_running_queries, get_slow_queries, list_slow_query_patterns, get_failed_queries, explain_query, estimate_query_cost |
| Health, storage, replication, merges | Metrics, disks, parts, replication, merges | get_metrics, get_disk_usage, get_table_parts, get_replication_status, get_merge_status |
| Capacity planning | Disk-full forecast, TTL/retention advice (recommend-only) | forecast_disk_capacity, suggest_ttl_adjustment |
| Aggregation advisor | Design MV/projection DDL from frequent aggregation queries, with a size estimate (recommend-only) | recommend_materialized_view |
| Plan & verify | A visible, adaptive step-by-step plan | update_plan |
| Knowledge & interaction | Load expert guides; ask the user | load_skill, ask_user |
| Charts & visualization | Run SQL and return an interactive chart | query_and_visualize |
| Insights | Explain a statistical anomaly baseline and z-score | explain_anomaly_score |
| Advisor | Ranked skip-index/projection/partition/PREWHERE recommendations for a slow query (recommend-only) | get_optimization_recommendations |
| Dashboards | Suggest a dashboard layout from registry charts for a natural-language request (recommend-only) | suggest_dashboard |
| Control actions (env-gated) | Kill query/mutation, optimize table | kill_query, kill_mutation, optimize_table |
Everything else (expensive-query rankings, query patterns, anomalies, table-design advice, settings, logs, replication queue, ZooKeeper, users…) is done with query plus the relevant skill recipe — so the agent keeps full reach with a much smaller, more reliable tool surface.
Control tools are off by default
Control tools (kill_query, kill_mutation, optimize_table) are disabled unless AGENT_ENABLE_CONTROL_TOOLS=true. The agent always confirms before using them.
Tool reference
| Tool | Description | Notes |
|---|---|---|
query | Execute a read-only SQL query on ClickHouse. | SELECT / WITH / DESCRIBE / EXPLAIN only. Write statements are rejected. |
list_databases | List all databases with engine and comment metadata. | |
list_tables | List tables in a database with row counts and size. | |
get_table_schema | Get column definitions for a specific table. | |
explore_table_schema | Three-mode exploration: no args → databases; database only → tables; database + table → full schema with indexes and keys. | |
get_running_queries | Currently running queries ordered by elapsed time. | Reads system.processes. |
get_slow_queries | Slowest completed queries from the query log. | Reads system.query_log. |
list_slow_query_patterns | Normalized slow query patterns — system.query_log grouped by normalized_query_hash, with calls, total/avg/p50/p95/p99/max duration, CPU time, peak memory, read/write bytes, errors, and cache-hit ratio per pattern. | Read-only. Use for "which query shape is expensive overall" — unlike get_slow_queries, which ranks individual executions. First step of the query-optimization diagnose loop. |
get_failed_queries | Recent failed queries with error details. | Reads system.query_log. |
explain_query | EXPLAIN PLAN / PIPELINE / PLAN with indexes for a query. | |
estimate_query_cost | Pre-flight cost estimate (rows scanned, bytes read, peak memory, wall time, confidence) from EXPLAIN alone. | Read-only and recommend-only — runs EXPLAIN only, never executes the analyzed query. |
get_metrics | Server health: version, uptime, active connections, memory. | Reads system.metrics. |
get_disk_usage | Per-disk free and total space. | Reads system.disks. |
get_table_parts | Part-level info for a table: rows, size, compression ratio. | Reads system.parts. |
forecast_disk_capacity | Forecast when disks will run out of free space from recent write-growth trend, plus top contributing tables. | Reads system.part_log (NewPart events) + system.disks. Reports a clear message instead of a forecast when part_log isn't enabled. |
suggest_ttl_adjustment | Recommend a TTL/retention change for a table to keep projected disk utilization ≤80%, never below a stated retention floor. | Returns a suggested ALTER TABLE ... MODIFY TTL ... string + risk note — recommend-only, never executed. Reports a clear message instead of a suggestion when part_log isn't enabled. |
recommend_materialized_view | Mine frequent GROUP BY/aggregate query shapes and design a Summing/AggregatingMergeTree MV or projection to pre-aggregate them. | Returns DDL text + size estimate + impact + risk (added write-path/storage cost) — recommend-only, never executed. Reads system.query_log, system.parts, system.tables. |
get_replication_status | Per-table replication delay, queue size, and replica counts. | Reads system.replicas. |
get_merge_status | Currently running merges with progress and elapsed time. | Reads system.merges. |
update_plan | Create or update a step-by-step workflow plan for the current investigation. | Rendered as a live checklist in the UI. |
load_skill | Load an expert guide (skill) with SQL recipes for a specific domain. | |
find_reference_query | Search the dashboard's built-in library of 100+ vetted, version-aware monitoring queries and return the closest matches (name, description, SQL). | Read-only, deterministic keyword-overlap lookup over the built-in QueryConfig catalog — executes nothing. Meant to be used before hand-writing system.* SQL. |
ask_user | Ask the user a question to gather information before proceeding. | |
query_and_visualize | Run a SQL query and return an interactive chart config. | Chart type is auto-detected from result columns. |
explain_anomaly_score | Explain a per-host/per-metric statistical anomaly baseline (mean, stddev, median, MAD, sample count) and, given a value, its z-score and anomaly verdict. | Read-only. Reports "no baseline yet" during cold start (falls back to a static threshold). |
get_optimization_recommendations | Analyze a slow query (by queryId from system.query_log, or raw sql) and return ranked skip-index, projection, partition-key, and PREWHERE recommendations with DDL/rewrite text, rationale, risk, effort, and an estimated granules/bytes saved. | Read-only and recommend-only — reads EXPLAIN + system.tables/system.columns/system.data_skipping_indexes/system.parts; never executes or applies any DDL or rewrite. Every impact figure is explicitly labeled an estimate. |
suggest_dashboard | Map a natural-language request to a dashboard layout built only from charts in the chart registry, auto-placed on the plan-57 12-column grid. | Recommend-only — never persists anything. The chat UI shows an "Apply to dashboard" action that loads the layout into the dashboard builder's unsaved working grid; saving still requires the existing save action. Rejects any chart name not present in both the client and API chart registries. |
kill_query | Kill a running query by query_id. | DESTRUCTIVE. Env-gated: requires AGENT_ENABLE_CONTROL_TOOLS=true. Agent always confirms first. |
kill_mutation | Cancel a running mutation on a table. | DESTRUCTIVE. Env-gated: requires AGENT_ENABLE_CONTROL_TOOLS=true. Agent always confirms first. |
optimize_table | Trigger an OPTIMIZE on a table to force merges. | DESTRUCTIVE. Env-gated: requires AGENT_ENABLE_CONTROL_TOOLS=true. Agent always confirms first. |
Skills
Skills are expert guides — with copy-pasteable SQL recipes against system.* — that the agent loads on demand. Because the toolset is lean, skills are how the agent stays powerful. Eighteen skills are included:
| Skill | Covers |
|---|---|
system-tables-reference | Exact columns of key system tables; recipes; tools vs raw SQL |
data-analysis | Aggregation & time-series recipes (largest scan, expensive queries, patterns, period comparison) |
anomaly-detection | Recent-vs-baseline comparisons (error spikes, p95 regressions, part explosions) |
query-tuning-advisor | Diagnose a slow query and propose concrete rewrites & better joins |
query-optimization | PREWHERE, JOIN patterns, materialized views, EXPLAIN, indexes |
schema-design-advisor | ORDER BY/partition keys, codecs, skip indexes, column type right-sizing |
storage-optimization | Compression codecs, TTL, tiered storage, part management |
version-upgrade-advisor | Whether/how to upgrade ClickHouse and what is gained |
hardware-tuning | Size settings to the box's cores/RAM/disk |
concept-explainer | Teach core ClickHouse concepts |
replication-guide | ReplicatedMergeTree, failover, lag diagnosis, Keeper |
cluster-operations | Distributed tables, resharding, node management, topology |
migration-patterns | ALTER patterns, zero-downtime schema changes |
security-hardening | RBAC, row policies, quotas, audit logging |
clickhouse-best-practices | Schema design, query tuning, operational guidelines |
troubleshooting | OOM, slow merges, stuck mutations, error-code diagnosis |
incident-response | Structured triage recipes (disk full, errors, replication lag, health sweep) |
plan-and-verify | Decompose with update_plan and verify each result before concluding |
List available skills with GET /api/v1/agent/skills.
Plan and verify
For multi-step tasks the agent authors a live checklist with update_plan, keeps exactly one step in progress, and adapts the plan as results come in. Crucially, it verifies each result before stating it — re-querying or cross-checking a second system table for a finding, running explain_query on both versions before claiming a rewrite is faster, and separating what is verified from what is a hypothesis. The plan-and-verify and incident-response skills encode the recipes. Simple one-step questions skip the plan entirely.
Example questions
Ask in plain English:
- "Which queries are running right now and how long have they been executing?" — lists query id, user, elapsed time, and memory, sorted by duration.
- "What were the 10 slowest queries in the last 24 hours?" — fetches from the query log and offers to EXPLAIN any of them.
- "What's the largest data scan ever performed on this cluster?" — loads
data-analysisand runs thesystem.query_logrecipe. - "Anything abnormal in the last hour versus baseline?" — loads
anomaly-detectionand compares recent activity to the preceding window. - "Show me query volume per hour over the last day as a chart." — runs the aggregation and renders a line chart inline.
- "Suggest a better ORDER BY and which columns should be LowCardinality for
analytics.events." — loadsschema-design-advisor, inspects the schema and parts, and recommends changes.