Query Logger - StarTree Docs

Minimum Startree version needed for this feature is 0.15

Overview

Query Logger captures a record of every query your Pinot cluster serves and stores it in a queryable Pinot table called system_query_log. Instead of hunting through broker logs, you run ordinary SQL against your own query history to answer questions like:

Which queries are the slowest or the most expensive right now?
Which query patterns (templates) dominate load?
Which queries are scanning too much data, hitting limits, or erroring?
How has a specific query’s latency or resource usage changed over time?

Each row = one query executed by a broker, enriched with timing, data-scan counts, CPU/memory usage, and the query text itself.

How it’s enabled

It is not on by default. Query Logger is a managed feature — contact StarTree Support to enable it for your cluster. StarTree configures the capture pipeline, storage location, and retention on your behalf. Once enabled, the system_query_log table appears and you can query it immediately like any other table (via the Query Console or the query API). No client-side setup is required.

The data: key fields

The table has ~70 columns. The ones you’ll use most, grouped by purpose: Identity & context

Field	Type	Meaning
`logTimestampMs`	LONG (epoch ms)	When the query was logged — your time filter column
`requestId`	LONG	Unique broker request ID
`query`	STRING	The SQL text (truncated if very long)
`queryHash`	STRING	Stable hash of the query template (literals normalized) — group by this to bucket parameterized queries*
`tableName` / `tablesQueried`	STRING	Table(s) the query targeted
`brokerId`	STRING	Broker that served it
`queryEngine`	STRING	`singleStage` or `multiStage`

Timing & result

Field	Meaning
`timeUsedMs`	Total end-to-end latency (ms) — the headline “how slow” number
`brokerReduceTimeMs`	Time spent merging results on the broker
`numRowsResultSet`	Rows returned
`numExceptions` / `exceptions`	Error count and error text
`partialResult`	Query returned incomplete results

Data scanned (why a query is expensive)

Field	Meaning
`numDocsScanned`	Rows actually read
`numEntriesScannedInFilter`	Entries scanned evaluating filters — high = missing/weak index
`numEntriesScannedPostFilter`	Entries scanned after filtering (projection cost)
`numSegmentsQueried` / `numSegmentsProcessed` / `numSegmentsMatched`	Segment fan-out
`totalDocs`	Total rows in the table at query time

Resource usage

Field	Meaning
`offlineTotalCpuTimeNs` / `realtimeTotalCpuTimeNs`	CPU time (ns) on offline / realtime servers
`offlineTotalMemAllocatedBytes` / `realtimeTotalMemAllocatedBytes`	Memory allocated

Limit / safety flags (booleans): numGroupsLimitReached, groupsTrimmed, maxRowsInJoinReached, maxRowsInWindowReached, maxRowsInDistinctReached — these tell you a query hit a guardrail and may have returned trimmed/partial data. Deep diagnostics: stageStats (JSON, multi-stage per-operator stats), serverStats, traceInfo, responseMetadataJson — for detailed drill-down.

* queryHash is populated only when query fingerprinting is enabled for your cluster; otherwise it may be empty. Ask StarTree Ops if you plan to rely on it.

Sample queries

All examples filter to the last 24 hours using logTimestampMs. Adjust the window as needed. Always keep a time filter on these queries (see Limitations & caveats).

Top 10 slowest queries (last 24h)

SELECT logTimestampMs, tableName, timeUsedMs, numDocsScanned, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
ORDER BY timeUsedMs DESC
LIMIT 10

Top 10 most CPU-expensive queries

SELECT logTimestampMs, tableName, timeUsedMs,
       offlineTotalCpuTimeNs + realtimeTotalCpuTimeNs AS totalCpuNs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
ORDER BY totalCpuNs DESC
LIMIT 10

Worst query patterns — aggregate by template (find the query shapes driving load)

SELECT queryHash,
       COUNT(*) AS executions,
       AVG(timeUsedMs) AS avgLatencyMs,
       MAX(timeUsedMs) AS maxLatencyMs,
       AVG(numEntriesScannedInFilter) AS avgFilterScan,
       ANY_VALUE(query) AS sampleQuery
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
GROUP BY queryHash
ORDER BY avgLatencyMs DESC
LIMIT 20

Heavy scanners (candidates for a better index)

SELECT logTimestampMs, tableName, numEntriesScannedInFilter, numDocsScanned, timeUsedMs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND numEntriesScannedInFilter > 1000000
ORDER BY numEntriesScannedInFilter DESC
LIMIT 25

Failed / errored queries

SELECT logTimestampMs, tableName, numExceptions, exceptions, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND numExceptions > 0
ORDER BY logTimestampMs DESC
LIMIT 50

Queries that hit a limit guardrail (possibly returning trimmed results)

SELECT logTimestampMs, tableName, numGroupsLimitReached, groupsTrimmed,
       maxRowsInJoinReached, timeUsedMs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND (numGroupsLimitReached = true OR groupsTrimmed = true OR maxRowsInJoinReached = true)
ORDER BY logTimestampMs DESC
LIMIT 50

Query volume & latency by table

SELECT tableName, COUNT(*) AS queries,
       AVG(timeUsedMs) AS avgMs, PERCENTILETDIGEST(timeUsedMs, 99) AS p99Ms
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
GROUP BY tableName
ORDER BY queries DESC

How the data is kept tidy

You don’t manage any of this — it’s handled automatically — but it’s useful to know:

Automatic segment compaction. The pipeline lands data as many small segments (one every few minutes per broker). A background merge/rollup job runs on a schedule and compacts those small segments into fewer, larger ones, keeping the total segment count bounded as the table grows. No action is required on your part.
Automatic retention. Data older than the configured retention window is dropped automatically, so the table stays within its storage budget.

Limitations & caveats

⚠️ It consumes storage on your cluster. system_query_log is a real Pinot table — its segments occupy disk on your servers. StarTree keeps it bounded with an automatic retention window (older data is dropped, typically ~30 days) and a storage quota, and the background merge job (above) keeps the segment count in check — so the footprint stays finite. High-QPS clusters generate more log data; talk to StarTree Ops if you need a longer window or expect large volume.

⚠️ Your analysis queries run on the same cluster. Queries against system_query_log execute on the same brokers and servers serving your production traffic and compete for the same resources. Always filter by logTimestampMs to a bounded window and avoid unbounded SELECT * scans, so debugging doesn’t add load to an already-busy cluster.

Near-real-time, not instant. Logged queries appear after background segment generation runs — expect a short delay (roughly 10–15 minutes) before a just-executed query is queryable. It’s for analysis and debugging, not live monitoring.
Long fields are truncated. query, exceptions, serverStats, and traceInfo have maximum lengths; extremely large values are cut off.
One row per broker-served query. Internal/system queries and the query-logger’s own background activity are not counted as user traffic.
Best-effort capture. Under extreme broker load a small fraction of records may be dropped rather than slow down query serving.

​Overview

​How it’s enabled

​The data: key fields

​Sample queries

​How the data is kept tidy

​Limitations & caveats

Overview

How it’s enabled

The data: key fields

Sample queries

How the data is kept tidy

Limitations & caveats