Understanding Recommendations

Beta feature — Query Analyzer is disabled by default and available on demand. Contact your StarTree account team to have it enabled for your environment.

This page is a reference for everything Query Analyzer returns. Use it to understand what each field means, what triggered a specific finding, and what the different recommendation and risk flag categories represent.

Output Sections

Risk Flags

The main output section. Each risk flag represents one detected issue and bundles everything related to that issue in one place: what was found, why it matters, the measured evidence behind it, and — for runtime analysis — the operator metrics and any correctness concerns.

Field	Description
Type	The category of issue detected (see Risk Flag Types below)
Severity	How serious the issue is — `HIGH`, `MEDIUM`, or `LOW`
Description	A plain-English explanation of what was found and why it matters
Evidence	The specific data that triggered the flag — for example, a column name, cardinality value, index configuration, or measured execution metric
Confidence	How confident the analyzer is in the finding — `HIGH`, `MEDIUM`, or `LOW`
Operator (runtime)	The execution operator this flag relates to — for example, Hash Join or Table Scan (see Operator Types)
Metrics (runtime)	Measured execution values for the operator — time, memory, rows, bytes shuffled

Only HIGH and MEDIUM confidence findings are shown. Low-confidence results are suppressed to keep output focused.

When a risk flag carries a correctness alert, address it before acting on performance recommendations. A correctness issue — such as the group limit being reached — means query results may already be incomplete, which affects how you should interpret the rest of the analysis.

Recommendations

Ranked, actionable suggestions. Each recommendation includes:

Field	Description
Rank	Priority order — rank 1 has the highest expected impact
Title	Short description of the action to take
Type	The category of change recommended (see Recommendation Types below)
Description	Detailed explanation of why this change helps
Suggested Change	A before/after view of the specific change — a query snippet, index config, or hint
Expected Impact	How much improvement is expected — `HIGH`, `MEDIUM`, or `LOW`
Confidence	How confident the analyzer is in this recommendation — `HIGH`, `MEDIUM`, or `LOW`
Evidence	References to the supporting evidence items that justify this recommendation
Tradeoffs	Side effects or caveats to consider before applying the change
Correctness Notes	Semantic considerations — for example, whether a join reorder affects NULL handling

Recommendations are ranked by expected impact first, then confidence. A query rewrite that eliminates an expensive operation ranks above an index addition at the same impact level.

Risk Flag Types

Risk flags are detected during both static and runtime analysis. There are two categories.

Index Gaps

These flags indicate that a column used in a filter, join, or aggregation is missing an index that would significantly speed up the query.

Risk Flag	What It Means
Missing Inverted Index	A column used in an equality or IN predicate has no inverted index, requiring a full scan of each segment
Missing Range Index	A column used in a range comparison (`>`, `<`, `BETWEEN`) has no range index
Missing Sorted Index	A high-selectivity column could reduce segment pruning with a sorted index
Missing Bloom Filter	A low-cardinality column with equality predicates could benefit from a bloom filter to skip segments quickly
Missing Text Index	A column used with `LIKE` or text search has no full-text index
Missing FST Index	A column used with regex predicates has no FST (Finite State Transducer) index
Missing JSON Index	A column with JSON path access has no JSON index
Missing Star-Tree Index	An aggregation pattern could be pre-computed with a star-tree index, potentially eliminating the scan entirely
Missing Timestamp Index	A time column used in time-bucketing functions has no timestamp index
Missing Geospatial Index	A column used in spatial predicates has no geospatial index
Missing Sparse Index	A column stored in cold/tiered storage could benefit from a sparse index to reduce remote reads
Missing Composite JSON Index	Multiple JSON path accesses could be accelerated with a composite JSON index
Missing Vector Index	A column used in vector similarity search has no vector index
Full Scan Risk	The query structure is likely to cause a full segment scan regardless of indexes

Query Pattern Issues

These flags indicate structural patterns in the SQL or query plan that are likely to cause poor performance.

Risk Flag	What It Means
Wrong Join Order	The larger table appears on the right (hash build) side of a join. Pinot builds the in-memory hash table from the right input — putting the larger table there wastes memory and slows the build
Filter Not Pushed Down	A filter appears above a join in the execution plan instead of below it. Pushing filters down reduces the number of rows that reach the join
Broadcast Join	The right-side input of a join is broadcast to every worker, which is expensive for large tables
Self Join	A table is joined with itself, which may indicate a query design issue
Cross Join Risk	A join condition is non-equality, which can produce a Cartesian product
Offset Pagination	`OFFSET` is used for pagination. For large offsets this requires scanning and discarding thousands of rows — use keyset (cursor) pagination instead
Select Star	`SELECT *` fetches all columns including potentially large or unused ones
Leading Wildcard	A `LIKE '%...'` pattern cannot use index prefix optimization and requires scanning every value
High Result Limit	A very large `LIMIT` value may produce excessive result sets
Unnecessary Sort	An `ORDER BY` clause has no corresponding `LIMIT`, meaning all rows are sorted
High Cardinality Group By	`GROUP BY` on a high-cardinality column produces a large intermediate state that requires significant memory
Group Limit Risk	The estimated number of groups may approach or exceed the default limit of 100,000 groups
Heavy Aggregation	A memory-intensive aggregate function such as `DISTINCTCOUNT` is used
Suboptimal Encoding	A column uses raw encoding where dictionary encoding would improve index effectiveness

Filter Propagation Issues

These flags identify missed opportunities to reduce data early by pushing filters across the query plan.

Risk Flag	What It Means
Missing Cross-Table Filter	A filter applied to one side of a join could also be applied to the other side, but isn’t — the engine ends up joining more rows than necessary
Large Table Without Pre-filter	A large table is scanned with no pre-filter, causing a full scan before the join or aggregation that follows
Missing Date Range Propagation	A date or time range filter on one side of a join is not propagated to the other side, missing an opportunity to prune data before the join executes

Operator Types

Operator types appear within risk flags during runtime analysis.

Operator	What It Does	Key Metrics to Watch
Table Scan	Scans segments and applies filters	Entries scanned in-filter (high = indexes not being used); entries scanned post-filter (high = star-tree opportunity)
Hash Join	Builds a hash table from the right input and probes it with the left	Hash table build time; memory allocated; rows emitted (high rows = join explosion)
Data Send	Sends rows to downstream workers across stages	Bytes serialized; fan-out (number of workers data is sent to)
Data Receive	Receives rows from upstream workers	Bytes deserialized; upstream wait time (high = upstream stage is slow)
Aggregation	Groups rows and computes aggregate functions	Memory allocated; groups emitted; whether the group limit was hit
Window Function	Computes window functions over a partition	Execution time; memory allocated (high = large partitions or wide frames)
Sort	Sorts rows, typically in combination with a LIMIT	Execution time and memory for sort buffer
Filter	Applies a predicate to filter rows mid-plan	Execution time; rows emitted (high = filter is not selective enough)
Transform	Applies column transformations or expression evaluation	Execution time; memory allocated (high = expensive expressions or many columns)
Union	Combines row sets from multiple inputs (`UNION ALL`)	Execution time; total rows emitted across all inputs
Intersect	Returns rows present in all inputs (`INTERSECT`)	Execution time; memory allocated for set tracking
Minus	Returns rows from the first input not present in subsequent inputs (`EXCEPT`)	Execution time; memory allocated for set tracking

Evidence Types

Evidence is embedded within each risk flag to show what triggered it. The type tells you where the finding came from.

Found in Both Modes

Evidence Type	What It Means
Full Scan	A full segment scan was detected in the explain plan
Index Used	A specific index was used — for example, an inverted index for an equality predicate
Join Order	Join order analysis based on table row counts
Missing Index	A column is missing an index that would help — covers inverted, range, text, FST, JSON, bloom filter, star-tree, timestamp, geospatial, sparse, composite JSON, and vector index types
Filter-Only Join	The right side of a join is used only to filter rows, making it a candidate for rewriting as `WHERE col IN (SELECT ...)`
Star-Tree Blocked	A star-tree index cannot be used because the aggregation depends on a cross-table join
Column Cardinality	Cardinality data from table statistics, used to assess index recommendations
Predicate Pattern	Analysis of the query’s filter structure
Suboptimal Encoding	Column encoding is suboptimal for the query’s access pattern

Found During Static Analysis Only

Evidence Type	What It Means
Plan Structure	A structural observation from the explain plan
Exchange Type	The data distribution pattern between stages — hash, broadcast, or random

Found During Runtime Analysis Only

Evidence Type	What It Means
High Post-Filter Scan	Many entries were scanned after the filter, suggesting a star-tree index opportunity
Join Build Time	High hash table build time indicates the right-side table is too large
Join Explosion	Large number of output rows from the join, indicating a many-to-many relationship
Group Limit Hit	The 100,000 group limit was reached — results may be incomplete
High Memory	An operator allocated significant heap memory
Heavy Shuffle	Large volumes of data were sent between workers
Upstream Wait	A stage spent significant time waiting for its upstream stage to produce data
Downstream Wait	A stage was blocked waiting for the downstream consumer to process rows
Broadcast Detected	A broadcast exchange was detected in the execution tree

Recommendation Types

Recommendations appear in both static and runtime analysis.

Query Changes

Type	What Gets Changed
Query Rewrite	The SQL itself — for example, reordering joins, replacing OFFSET with keyset pagination, or rewriting a join as a subquery
Hint Addition	A query hint added inline to control join strategy, aggregation behavior, or partition handling

Index Additions

Type	Index Added
Add Inverted Index	Speeds up equality and IN predicates on the target column
Add Range Index	Speeds up range comparisons on the target column
Add Bloom Filter	Allows fast segment skipping for equality predicates on the target column
Add Text Index	Enables full-text search and speeds up LIKE predicates
Add FST Index	Enables fast regex matching on the target column
Add JSON Index	Speeds up JSON path access on the target column
Add Star-Tree Index	Pre-computes aggregations to potentially eliminate scanning entirely
Add Timestamp Index	Speeds up time-bucketing functions on datetime columns
Add Geospatial Index	Speeds up spatial predicate evaluation
Add Sparse Index	Reduces remote reads for columns in cold/tiered storage
Add Composite JSON Index	Optimizes multiple JSON path accesses together
Add Vector Index	Enables efficient vector similarity search

Configuration Changes

Type	What Gets Changed
Encoding Change	Column encoding type — for example, switching from raw to dictionary encoding
Join Strategy	How the join is executed — for example, using a lookup join for dimension tables or a colocated join to eliminate data shuffling
Aggregation Hint	Aggregation behavior — for example, enabling group trimming for ORDER BY + LIMIT patterns, or increasing the group count limit

​Output Sections

​Risk Flags

​Recommendations

​Risk Flag Types

​Index Gaps

​Query Pattern Issues

​Filter Propagation Issues

​Operator Types

​Evidence Types

​Found in Both Modes

​Found During Static Analysis Only

​Found During Runtime Analysis Only

​Recommendation Types

​Query Changes

​Index Additions

​Configuration Changes

Output Sections

Risk Flags

Recommendations

Risk Flag Types

Index Gaps

Query Pattern Issues

Filter Propagation Issues

Operator Types

Evidence Types

Found in Both Modes

Found During Static Analysis Only

Found During Runtime Analysis Only

Recommendation Types

Query Changes

Index Additions

Configuration Changes