> ## Documentation Index
> Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Query Logger

<Info>
  Minimum Startree version needed for this feature is 0.15
</Info>

## **Overview**

Query Logger captures a **record of every query your Pinot cluster serves** and stores it in a queryable Pinot table called `system_query_log`. Instead of hunting through broker logs, you run ordinary SQL against your own query history to answer questions like:

* Which queries are the **slowest** or the **most expensive** right now?
* Which query **patterns** (templates) dominate load?
* Which queries are **scanning too much data**, hitting **limits**, or **erroring**?
* How has a specific query's latency or resource usage **changed over time**?

Each row = one query executed by a broker, enriched with timing, data-scan counts, CPU/memory usage, and the query text itself.

## **How it's enabled**

**It is not on by default.** Query Logger is a managed feature — **contact StarTree Support to enable it for your cluster.** StarTree configures the capture pipeline, storage location, and retention on your behalf. Once enabled, the `system_query_log` table appears and you can query it immediately like any other table (via the Query Console or the query API). No client-side setup is required.

## **The data: key fields**

The table has \~70 columns. The ones you'll use most, grouped by purpose:

**Identity & context**

| Field                         | Type            | Meaning                                                                                                     |
| ----------------------------- | --------------- | ----------------------------------------------------------------------------------------------------------- |
| `logTimestampMs`              | LONG (epoch ms) | When the query was logged — your **time filter** column                                                     |
| `requestId`                   | LONG            | Unique broker request ID                                                                                    |
| `query`                       | STRING          | The SQL text (truncated if very long)                                                                       |
| `queryHash`                   | STRING          | Stable hash of the query *template* (literals normalized) — group by this to bucket parameterized queries\* |
| `tableName` / `tablesQueried` | STRING          | Table(s) the query targeted                                                                                 |
| `brokerId`                    | STRING          | Broker that served it                                                                                       |
| `queryEngine`                 | STRING          | `singleStage` or `multiStage`                                                                               |

**Timing & result**

| Field                          | Meaning                                                        |
| ------------------------------ | -------------------------------------------------------------- |
| `timeUsedMs`                   | Total end-to-end latency (ms) — the headline "how slow" number |
| `brokerReduceTimeMs`           | Time spent merging results on the broker                       |
| `numRowsResultSet`             | Rows returned                                                  |
| `numExceptions` / `exceptions` | Error count and error text                                     |
| `partialResult`                | Query returned incomplete results                              |

**Data scanned (why a query is expensive)**

| Field                                                                | Meaning                                                        |
| -------------------------------------------------------------------- | -------------------------------------------------------------- |
| `numDocsScanned`                                                     | Rows actually read                                             |
| `numEntriesScannedInFilter`                                          | Entries scanned evaluating filters — high = missing/weak index |
| `numEntriesScannedPostFilter`                                        | Entries scanned after filtering (projection cost)              |
| `numSegmentsQueried` / `numSegmentsProcessed` / `numSegmentsMatched` | Segment fan-out                                                |
| `totalDocs`                                                          | Total rows in the table at query time                          |

**Resource usage**

| Field                                                              | Meaning                                     |
| ------------------------------------------------------------------ | ------------------------------------------- |
| `offlineTotalCpuTimeNs` / `realtimeTotalCpuTimeNs`                 | CPU time (ns) on offline / realtime servers |
| `offlineTotalMemAllocatedBytes` / `realtimeTotalMemAllocatedBytes` | Memory allocated                            |

**Limit / safety flags** (booleans): `numGroupsLimitReached`, `groupsTrimmed`, `maxRowsInJoinReached`, `maxRowsInWindowReached`, `maxRowsInDistinctReached` — these tell you a query hit a guardrail and may have returned trimmed/partial data.

**Deep diagnostics**: `stageStats` (JSON, multi-stage per-operator stats), `serverStats`, `traceInfo`, `responseMetadataJson` — for detailed drill-down.

> \* `queryHash` is populated only when query fingerprinting is enabled for your cluster; otherwise it may be empty. Ask StarTree Ops if you plan to rely on it.

## **Sample queries**

> All examples filter to the **last 24 hours** using `logTimestampMs`. Adjust the window as needed. Always keep a time filter on these queries (see Limitations & caveats).

**Top 10 slowest queries (last 24h)**

```sql theme={null}
SELECT logTimestampMs, tableName, timeUsedMs, numDocsScanned, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
ORDER BY timeUsedMs DESC
LIMIT 10
```

**Top 10 most CPU-expensive queries**

```sql theme={null}
SELECT logTimestampMs, tableName, timeUsedMs,
       offlineTotalCpuTimeNs + realtimeTotalCpuTimeNs AS totalCpuNs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
ORDER BY totalCpuNs DESC
LIMIT 10
```

**Worst query *patterns* — aggregate by template** (find the query shapes driving load)

```sql theme={null}
SELECT queryHash,
       COUNT(*) AS executions,
       AVG(timeUsedMs) AS avgLatencyMs,
       MAX(timeUsedMs) AS maxLatencyMs,
       AVG(numEntriesScannedInFilter) AS avgFilterScan,
       ANY_VALUE(query) AS sampleQuery
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
GROUP BY queryHash
ORDER BY avgLatencyMs DESC
LIMIT 20
```

**Heavy scanners (candidates for a better index)**

```sql theme={null}
SELECT logTimestampMs, tableName, numEntriesScannedInFilter, numDocsScanned, timeUsedMs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND numEntriesScannedInFilter > 1000000
ORDER BY numEntriesScannedInFilter DESC
LIMIT 25
```

**Failed / errored queries**

```sql theme={null}
SELECT logTimestampMs, tableName, numExceptions, exceptions, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND numExceptions > 0
ORDER BY logTimestampMs DESC
LIMIT 50
```

**Queries that hit a limit guardrail** (possibly returning trimmed results)

```sql theme={null}
SELECT logTimestampMs, tableName, numGroupsLimitReached, groupsTrimmed,
       maxRowsInJoinReached, timeUsedMs, query
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
  AND (numGroupsLimitReached = true OR groupsTrimmed = true OR maxRowsInJoinReached = true)
ORDER BY logTimestampMs DESC
LIMIT 50
```

**Query volume & latency by table**

```sql theme={null}
SELECT tableName, COUNT(*) AS queries,
       AVG(timeUsedMs) AS avgMs, PERCENTILETDIGEST(timeUsedMs, 99) AS p99Ms
FROM system_query_log
WHERE logTimestampMs > now() - 86400000
GROUP BY tableName
ORDER BY queries DESC
```

## **How the data is kept tidy**

You don't manage any of this — it's handled automatically — but it's useful to know:

* **Automatic segment compaction.** The pipeline lands data as many small segments (one every few minutes per broker). A **background merge/rollup job runs on a schedule and compacts those small segments into fewer, larger ones**, keeping the **total segment count bounded** as the table grows. No action is required on your part.
* **Automatic retention.** Data older than the configured retention window is dropped automatically, so the table stays within its storage budget.

## **Limitations & caveats**

> ⚠️ **It consumes storage on your cluster.** `system_query_log` is a real Pinot table — its segments occupy disk on your servers. StarTree keeps it bounded with an automatic **retention window** (older data is dropped, typically \~30 days) and a **storage quota**, and the background merge job (above) keeps the segment count in check — so the footprint stays finite. High-QPS clusters generate more log data; talk to StarTree Ops if you need a longer window or expect large volume.

> ⚠️ **Your analysis queries run on the same cluster.** Queries against `system_query_log` execute on the **same brokers and servers serving your production traffic** and compete for the same resources. **Always filter by** `logTimestampMs` to a bounded window and avoid unbounded `SELECT *` scans, so debugging doesn't add load to an already-busy cluster.

* **Near-real-time, not instant.** Logged queries appear after background segment generation runs — expect a **short delay (roughly 10–15 minutes)** before a just-executed query is queryable. It's for analysis and debugging, not live monitoring.
* **Long fields are truncated.** `query`, `exceptions`, `serverStats`, and `traceInfo` have maximum lengths; extremely large values are cut off.
* **One row per broker-served query.** Internal/system queries and the query-logger's own background activity are not counted as user traffic.
* **Best-effort capture.** Under extreme broker load a small fraction of records may be dropped rather than slow down query serving.
