What Is Proactive Analyzer?
Static and Runtime Analysis both require a deliberate trigger — you submit a query and ask for it to be analyzed. Proactive Analyzer removes that requirement. Proactive Analyzer runs on a schedule (once a day by default), scans the queries that actually ran on your cluster, ranks them by cost, and runs the same analysis engine against the worst offenders automatically. It does this without re-running any query — it reads execution statistics that Pinot already recorded. The goal is to catch expensive query patterns as they occur in production — the queries that quietly burn cluster resources without anyone flagging them — rather than waiting for a user to notice a slowdown and submit it for analysis.How It Works
Pinot records every query it runs in a system table,system_query_log, including the SQL, a stable hash identifying the query shape, execution time, CPU counters, scan volume, and per-stage execution statistics. Proactive Analyzer’s scheduled job reads that log and, on each run:
- Checks the log is available. If
system_query_logis not queryable on the cluster, the run skips cleanly. - Ranks query shapes by cost. Shapes are grouped by query hash and ranked by mean execution time over the lookback window (24 hours by default) — a shape that is consistently slow ranks above one that is merely popular. A minimum latency floor filters out trivially fast queries and one-off spikes.
- Selects the worst run of each shape. For each of the top-ranked shapes (5 by default), the single slowest run is pulled, carrying its SQL and recorded execution statistics.
- Filters. Queries against the log table itself are excluded, as are rows with no execution statistics.
- Analyzes each one, reusing the existing analysis engine (see below), up to a hard per-run cap (3 by default).
- Stores the results, keyed by workspace and query shape.
- Delivers a digest summarizing what it found.
It never re-runs a slow query
Re-running the exact queries just flagged as expensive would place the heaviest possible load back on the cluster — the opposite of what a tuning tool should do. Proactive Analyzer avoids this by reusing the execution statistics already captured in the query log. The only live data it fetches during analysis is cheap and current: table config, schema, table stats, and the explain plan — so recommendations reflect how the table is set up today, without replaying any heavy query.It reuses the same analysis engine
Proactive Analyzer does not introduce a second recommendation engine. It selects which queries to analyze, then hands each one to the same engine used by Runtime Analysis — same prompt, same evidence gathering, same validation and guardrail checks — with one difference: instead of executing the query to capture statistics, it injects the statistics already recorded in the query log. This matters for quality: the analyzer’s most useful checks (for example, “this query is already fast enough” or “this ran out of memory”) depend on seeing real runtime numbers. Because Proactive Analyzer feeds in the real numbers from the logged slow run, those checks behave exactly as they do for a manually submitted query — the output isn’t a lower-quality approximation.Skips work that hasn’t changed
Before analyzing a query shape, the job checks whether it has already analyzed it. It fingerprints the current table configuration and compares it to the fingerprint from the last analysis. If nothing has changed, there is nothing new to recommend — the job refreshes the cost numbers and skips the analysis call for that shape entirely, keeping run cost predictable.Delivery
Every run stores its recommendations and, on top of that, sends a digest:| Channel | Behavior |
|---|---|
| Log | Always on. A formatted digest is written to the service log. |
| Slack | Optional. When enabled with an incoming webhook URL, the run posts a summary of newly analyzed recommendations. |
| Not yet available. |
Available Today
While you evaluate Proactive Analyzer, both on-demand analysis modes are fully available:Static Analysis
Run before executing your query. Catches structural issues using table metadata, index configuration, and the explain plan.
Runtime Analysis
Run after executing your query. Uses real execution statistics to pinpoint actual bottlenecks with measured evidence.
Configuration and Limitations
Config keys, the manual trigger endpoint, and current known limitations.

