Skip to main content
Minimum Startree version needed for this feature is 0.15

Overview

Query Logger captures a record of every query your Pinot cluster serves and stores it in a queryable Pinot table called system_query_log. Instead of hunting through broker logs, you run ordinary SQL against your own query history to answer questions like:
  • Which queries are the slowest or the most expensive right now?
  • Which query patterns (templates) dominate load?
  • Which queries are scanning too much data, hitting limits, or erroring?
  • How has a specific query’s latency or resource usage changed over time?
Each row = one query executed by a broker, enriched with timing, data-scan counts, CPU/memory usage, and the query text itself.

How it’s enabled

It is not on by default. Query Logger is a managed feature — contact StarTree Support to enable it for your cluster. StarTree configures the capture pipeline, storage location, and retention on your behalf. Once enabled, the system_query_log table appears and you can query it immediately like any other table (via the Query Console or the query API). No client-side setup is required.

The data: key fields

The table has ~70 columns. The ones you’ll use most, grouped by purpose: Identity & context
FieldTypeMeaning
logTimestampMsLONG (epoch ms)When the query was logged — your time filter column
requestIdLONGUnique broker request ID
querySTRINGThe SQL text (truncated if very long)
queryHashSTRINGStable hash of the query template (literals normalized) — group by this to bucket parameterized queries*
tableName / tablesQueriedSTRINGTable(s) the query targeted
brokerIdSTRINGBroker that served it
queryEngineSTRINGsingleStage or multiStage
Timing & result
FieldMeaning
timeUsedMsTotal end-to-end latency (ms) — the headline “how slow” number
brokerReduceTimeMsTime spent merging results on the broker
numRowsResultSetRows returned
numExceptions / exceptionsError count and error text
partialResultQuery returned incomplete results
Data scanned (why a query is expensive)
FieldMeaning
numDocsScannedRows actually read
numEntriesScannedInFilterEntries scanned evaluating filters — high = missing/weak index
numEntriesScannedPostFilterEntries scanned after filtering (projection cost)
numSegmentsQueried / numSegmentsProcessed / numSegmentsMatchedSegment fan-out
totalDocsTotal rows in the table at query time
Resource usage
FieldMeaning
offlineTotalCpuTimeNs / realtimeTotalCpuTimeNsCPU time (ns) on offline / realtime servers
offlineTotalMemAllocatedBytes / realtimeTotalMemAllocatedBytesMemory allocated
Limit / safety flags (booleans): numGroupsLimitReached, groupsTrimmed, maxRowsInJoinReached, maxRowsInWindowReached, maxRowsInDistinctReached — these tell you a query hit a guardrail and may have returned trimmed/partial data. Deep diagnostics: stageStats (JSON, multi-stage per-operator stats), serverStats, traceInfo, responseMetadataJson — for detailed drill-down.
* queryHash is populated only when query fingerprinting is enabled for your cluster; otherwise it may be empty. Ask StarTree Ops if you plan to rely on it.

Sample queries

All examples filter to the last 24 hours using logTimestampMs. Adjust the window as needed. Always keep a time filter on these queries (see Limitations & caveats).
Top 10 slowest queries (last 24h)
SELECT logTimestampMs, tableName, timeUsedMs, numDocsScanned, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
ORDER BY timeUsedMs DESC
LIMIT 10
Top 10 most CPU-expensive queries
SELECT logTimestampMs, tableName, timeUsedMs,
       offlineTotalCpuTimeNs + realtimeTotalCpuTimeNs AS totalCpuNs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
ORDER BY totalCpuNs DESC
LIMIT 10
Worst query patterns — aggregate by template (find the query shapes driving load)
SELECT queryHash,
       COUNT(*) AS executions,
       AVG(timeUsedMs) AS avgLatencyMs,
       MAX(timeUsedMs) AS maxLatencyMs,
       AVG(numEntriesScannedInFilter) AS avgFilterScan,
       ANY_VALUE(query) AS sampleQuery
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
GROUP BY queryHash
ORDER BY avgLatencyMs DESC
LIMIT 20
Heavy scanners (candidates for a better index)
SELECT logTimestampMs, tableName, numEntriesScannedInFilter, numDocsScanned, timeUsedMs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND numEntriesScannedInFilter > 1000000
ORDER BY numEntriesScannedInFilter DESC
LIMIT 25
Failed / errored queries
SELECT logTimestampMs, tableName, numExceptions, exceptions, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND numExceptions > 0
ORDER BY logTimestampMs DESC
LIMIT 50
Queries that hit a limit guardrail (possibly returning trimmed results)
SELECT logTimestampMs, tableName, numGroupsLimitReached, groupsTrimmed,
       maxRowsInJoinReached, timeUsedMs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND (numGroupsLimitReached = true OR groupsTrimmed = true OR maxRowsInJoinReached = true)
ORDER BY logTimestampMs DESC
LIMIT 50
Query volume & latency by table
SELECT tableName, COUNT(*) AS queries,
       AVG(timeUsedMs) AS avgMs, PERCENTILETDIGEST(timeUsedMs, 99) AS p99Ms
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
GROUP BY tableName
ORDER BY queries DESC

How the data is kept tidy

You don’t manage any of this — it’s handled automatically — but it’s useful to know:
  • Automatic segment compaction. The pipeline lands data as many small segments (one every few minutes per broker). A background merge/rollup job runs on a schedule and compacts those small segments into fewer, larger ones, keeping the total segment count bounded as the table grows. No action is required on your part.
  • Automatic retention. Data older than the configured retention window is dropped automatically, so the table stays within its storage budget.

Limitations & caveats

⚠️ It consumes storage on your cluster. system_query_log is a real Pinot table — its segments occupy disk on your servers. StarTree keeps it bounded with an automatic retention window (older data is dropped, typically ~30 days) and a storage quota, and the background merge job (above) keeps the segment count in check — so the footprint stays finite. High-QPS clusters generate more log data; talk to StarTree Ops if you need a longer window or expect large volume.
⚠️ Your analysis queries run on the same cluster. Queries against system_query_log execute on the same brokers and servers serving your production traffic and compete for the same resources. Always filter by logTimestampMs to a bounded window and avoid unbounded SELECT * scans, so debugging doesn’t add load to an already-busy cluster.
  • Near-real-time, not instant. Logged queries appear after background segment generation runs — expect a short delay (roughly 10–15 minutes) before a just-executed query is queryable. It’s for analysis and debugging, not live monitoring.
  • Long fields are truncated. query, exceptions, serverStats, and traceInfo have maximum lengths; extremely large values are cut off.
  • One row per broker-served query. Internal/system queries and the query-logger’s own background activity are not counted as user traffic.
  • Best-effort capture. Under extreme broker load a small fraction of records may be dropped rather than slow down query serving.