Beta feature — Query Analyzer is disabled by default and available on demand. Contact your StarTree account team to have it enabled for your environment.
This page is a reference for everything Query Analyzer returns. Use it to understand what each field means, what triggered a specific finding, and what the different recommendation and risk flag categories represent.
Output Sections
Query Summary
Risk Flags
The main output section. Each risk flag represents one detected issue and bundles everything related to that issue in one place: what was found, why it matters, the measured evidence behind it, and — for runtime analysis — the operator metrics and any correctness concerns.
| Field | Description |
|---|
| Type | The category of issue detected (see Risk Flag Types below) |
| Severity | How serious the issue is — HIGH, MEDIUM, or LOW |
| Description | A plain-English explanation of what was found and why it matters |
| Evidence | The specific data that triggered the flag — for example, a column name, cardinality value, index configuration, or measured execution metric |
| Confidence | How confident the analyzer is in the finding — HIGH, MEDIUM, or LOW |
| Operator (runtime) | The execution operator this flag relates to — for example, Hash Join or Table Scan (see Operator Types) |
| Metrics (runtime) | Measured execution values for the operator — time, memory, rows, bytes shuffled |
Only HIGH and MEDIUM confidence findings are shown. Low-confidence results are suppressed to keep output focused.
When a risk flag carries a correctness alert, address it before acting on performance recommendations. A correctness issue — such as the group limit being reached — means query results may already be incomplete, which affects how you should interpret the rest of the analysis.
Recommendations
Ranked, actionable suggestions. Each recommendation includes:
| Field | Description |
|---|
| Rank | Priority order — rank 1 has the highest expected impact |
| Title | Short description of the action to take |
| Type | The category of change recommended (see Recommendation Types below) |
| Description | Detailed explanation of why this change helps |
| Suggested Change | A before/after view of the specific change — a query snippet, index config, or hint |
| Expected Impact | How much improvement is expected — HIGH, MEDIUM, or LOW |
| Confidence | How confident the analyzer is in this recommendation — HIGH, MEDIUM, or LOW |
| Evidence | References to the supporting evidence items that justify this recommendation |
| Tradeoffs | Side effects or caveats to consider before applying the change |
| Correctness Notes | Semantic considerations — for example, whether a join reorder affects NULL handling |
Recommendations are ranked by expected impact first, then confidence. A query rewrite that eliminates an expensive operation ranks above an index addition at the same impact level.
Risk Flag Types
Risk flags are detected during both static and runtime analysis. There are two categories.
Index Gaps
These flags indicate that a column used in a filter, join, or aggregation is missing an index that would significantly speed up the query.
| Risk Flag | What It Means |
|---|
| Missing Inverted Index | A column used in an equality or IN predicate has no inverted index, requiring a full scan of each segment |
| Missing Range Index | A column used in a range comparison (>, <, BETWEEN) has no range index |
| Missing Sorted Index | A high-selectivity column could reduce segment pruning with a sorted index |
| Missing Bloom Filter | A low-cardinality column with equality predicates could benefit from a bloom filter to skip segments quickly |
| Missing Text Index | A column used with LIKE or text search has no full-text index |
| Missing FST Index | A column used with regex predicates has no FST (Finite State Transducer) index |
| Missing JSON Index | A column with JSON path access has no JSON index |
| Missing Star-Tree Index | An aggregation pattern could be pre-computed with a star-tree index, potentially eliminating the scan entirely |
| Missing Timestamp Index | A time column used in time-bucketing functions has no timestamp index |
| Missing Geospatial Index | A column used in spatial predicates has no geospatial index |
| Missing Sparse Index | A column stored in cold/tiered storage could benefit from a sparse index to reduce remote reads |
| Missing Composite JSON Index | Multiple JSON path accesses could be accelerated with a composite JSON index |
| Missing Vector Index | A column used in vector similarity search has no vector index |
| Full Scan Risk | The query structure is likely to cause a full segment scan regardless of indexes |
Query Pattern Issues
These flags indicate structural patterns in the SQL or query plan that are likely to cause poor performance.
| Risk Flag | What It Means |
|---|
| Wrong Join Order | The larger table appears on the right (hash build) side of a join. Pinot builds the in-memory hash table from the right input — putting the larger table there wastes memory and slows the build |
| Filter Not Pushed Down | A filter appears above a join in the execution plan instead of below it. Pushing filters down reduces the number of rows that reach the join |
| Broadcast Join | The right-side input of a join is broadcast to every worker, which is expensive for large tables |
| Self Join | A table is joined with itself, which may indicate a query design issue |
| Cross Join Risk | A join condition is non-equality, which can produce a Cartesian product |
| Offset Pagination | OFFSET is used for pagination. For large offsets this requires scanning and discarding thousands of rows — use keyset (cursor) pagination instead |
| Select Star | SELECT * fetches all columns including potentially large or unused ones |
| Leading Wildcard | A LIKE '%...' pattern cannot use index prefix optimization and requires scanning every value |
| High Result Limit | A very large LIMIT value may produce excessive result sets |
| Unnecessary Sort | An ORDER BY clause has no corresponding LIMIT, meaning all rows are sorted |
| High Cardinality Group By | GROUP BY on a high-cardinality column produces a large intermediate state that requires significant memory |
| Group Limit Risk | The estimated number of groups may approach or exceed the default limit of 100,000 groups |
| Heavy Aggregation | A memory-intensive aggregate function such as DISTINCTCOUNT is used |
| Suboptimal Encoding | A column uses raw encoding where dictionary encoding would improve index effectiveness |
Filter Propagation Issues
These flags identify missed opportunities to reduce data early by pushing filters across the query plan.
| Risk Flag | What It Means |
|---|
| Missing Cross-Table Filter | A filter applied to one side of a join could also be applied to the other side, but isn’t — the engine ends up joining more rows than necessary |
| Large Table Without Pre-filter | A large table is scanned with no pre-filter, causing a full scan before the join or aggregation that follows |
| Missing Date Range Propagation | A date or time range filter on one side of a join is not propagated to the other side, missing an opportunity to prune data before the join executes |
Operator Types
Operator types appear within risk flags during runtime analysis.
| Operator | What It Does | Key Metrics to Watch |
|---|
| Table Scan | Scans segments and applies filters | Entries scanned in-filter (high = indexes not being used); entries scanned post-filter (high = star-tree opportunity) |
| Hash Join | Builds a hash table from the right input and probes it with the left | Hash table build time; memory allocated; rows emitted (high rows = join explosion) |
| Data Send | Sends rows to downstream workers across stages | Bytes serialized; fan-out (number of workers data is sent to) |
| Data Receive | Receives rows from upstream workers | Bytes deserialized; upstream wait time (high = upstream stage is slow) |
| Aggregation | Groups rows and computes aggregate functions | Memory allocated; groups emitted; whether the group limit was hit |
| Window Function | Computes window functions over a partition | Execution time; memory allocated (high = large partitions or wide frames) |
| Sort | Sorts rows, typically in combination with a LIMIT | Execution time and memory for sort buffer |
Evidence Types
Evidence is embedded within each risk flag to show what triggered it. The type tells you where the finding came from.
Found in Both Modes
| Evidence Type | What It Means |
|---|
| Full Scan | A full segment scan was detected in the explain plan |
| Index Used | A specific index was used — for example, an inverted index for an equality predicate |
| Join Order | Join order analysis based on table row counts |
| Missing Index | A column is missing an index that would help — covers inverted, range, text, FST, JSON, bloom filter, star-tree, timestamp, geospatial, sparse, composite JSON, and vector index types |
| Filter-Only Join | The right side of a join is used only to filter rows, making it a candidate for rewriting as WHERE col IN (SELECT ...) |
| Star-Tree Blocked | A star-tree index cannot be used because the aggregation depends on a cross-table join |
| Column Cardinality | Cardinality data from table statistics, used to assess index recommendations |
| Predicate Pattern | Analysis of the query’s filter structure |
| Suboptimal Encoding | Column encoding is suboptimal for the query’s access pattern |
Found During Static Analysis Only
| Evidence Type | What It Means |
|---|
| Plan Structure | A structural observation from the explain plan |
| Exchange Type | The data distribution pattern between stages — hash, broadcast, or random |
Found During Runtime Analysis Only
| Evidence Type | What It Means |
|---|
| High Post-Filter Scan | Many entries were scanned after the filter, suggesting a star-tree index opportunity |
| Join Build Time | High hash table build time indicates the right-side table is too large |
| Join Explosion | Large number of output rows from the join, indicating a many-to-many relationship |
| Group Limit Hit | The 100,000 group limit was reached — results may be incomplete |
| High Memory | An operator allocated significant heap memory |
| Heavy Shuffle | Large volumes of data were sent between workers |
| Upstream Wait | A stage spent significant time waiting for its upstream stage to produce data |
| Downstream Wait | A stage was blocked waiting for the downstream consumer to process rows |
| Broadcast Detected | A broadcast exchange was detected in the execution tree |
Recommendation Types
Recommendations appear in both static and runtime analysis.
Query Changes
| Type | What Gets Changed |
|---|
| Query Rewrite | The SQL itself — for example, reordering joins, replacing OFFSET with keyset pagination, or rewriting a join as a subquery |
| Hint Addition | A query hint added inline to control join strategy, aggregation behavior, or partition handling |
Index Additions
| Type | Index Added |
|---|
| Add Inverted Index | Speeds up equality and IN predicates on the target column |
| Add Range Index | Speeds up range comparisons on the target column |
| Add Bloom Filter | Allows fast segment skipping for equality predicates on the target column |
| Add Text Index | Enables full-text search and speeds up LIKE predicates |
| Add FST Index | Enables fast regex matching on the target column |
| Add JSON Index | Speeds up JSON path access on the target column |
| Add Star-Tree Index | Pre-computes aggregations to potentially eliminate scanning entirely |
| Add Timestamp Index | Speeds up time-bucketing functions on datetime columns |
| Add Geospatial Index | Speeds up spatial predicate evaluation |
| Add Sparse Index | Reduces remote reads for columns in cold/tiered storage |
| Add Composite JSON Index | Optimizes multiple JSON path accesses together |
| Add Vector Index | Enables efficient vector similarity search |
Configuration Changes
| Type | What Gets Changed |
|---|
| Encoding Change | Column encoding type — for example, switching from raw to dictionary encoding |
| Join Strategy | How the join is executed — for example, using a lookup join for dimension tables or a colocated join to eliminate data shuffling |
| Aggregation Hint | Aggregation behavior — for example, enabling group trimming for ORDER BY + LIMIT patterns, or increasing the group count limit |