> ## Documentation Index > Fetch the complete documentation index at: https://docs.startree.ai/llms.txt > Use this file to discover all available pages before exploring further. # Understanding Recommendations **Beta feature** — Query Analyzer is disabled by default and available on demand. Contact your StarTree account team to have it enabled for your environment. This page is a reference for everything Query Analyzer returns. Use it to understand what each field means, what triggered a specific finding, and what the different recommendation and risk flag categories represent. ## Output Sections ### Risk Flags The main output section. Each risk flag represents one detected issue and bundles everything related to that issue in one place: what was found, why it matters, the measured evidence behind it, and — for runtime analysis — the operator metrics and any correctness concerns. | Field | Description | | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- | | **Type** | The category of issue detected (see [Risk Flag Types](#risk-flag-types) below) | | **Severity** | How serious the issue is — `HIGH` or `MEDIUM` | | **Description** | A plain-English explanation of what was found and why it matters | | **Evidence** | The specific data that triggered the flag — for example, a column name, cardinality value, index configuration, or measured execution metric | | **Confidence** | How confident the analyzer is in the finding — `HIGH` or `MEDIUM` | | **Operator** *(runtime)* | The execution operator this flag relates to — for example, Hash Join or Table Scan (see [Operator Types](#operator-types)) | | **Metrics** *(runtime)* | Measured execution values for the operator — time, memory, rows, bytes shuffled | Only `HIGH` and `MEDIUM` confidence findings are shown. Low-confidence results are suppressed to keep output focused. When a risk flag carries a correctness alert, address it before acting on performance recommendations. A correctness issue — such as the group limit being reached — means query results may already be incomplete, which affects how you should interpret the rest of the analysis. ### Recommendations Ranked, actionable suggestions. Each recommendation includes: | Field | Description | | --------------------- | -------------------------------------------------------------------------------------------- | | **Rank** | Priority order — rank 1 has the highest expected impact | | **Title** | Short description of the action to take | | **Type** | The category of change recommended (see [Recommendation Types](#recommendation-types) below) | | **Description** | Detailed explanation of why this change helps | | **Suggested Change** | A before/after view of the specific change — a query snippet, index config, or hint | | **Expected Impact** | How much improvement is expected — `HIGH` or `MEDIUM` | | **Confidence** | How confident the analyzer is in this recommendation — `HIGH` or `MEDIUM` | | **Evidence** | References to the supporting evidence items that justify this recommendation | | **Tradeoffs** | Side effects or caveats to consider before applying the change | | **Correctness Notes** | Semantic considerations — for example, whether a join reorder affects NULL handling | Recommendations are ranked by expected impact first, then confidence. A query rewrite that eliminates an expensive operation ranks above an index addition at the same impact level. *** ## Risk Flag Types Risk flags are detected during both **static and runtime analysis**. There are two categories. ### Index Gaps These flags indicate that a column used in a filter, join, or aggregation is missing an index that would significantly speed up the query. | Risk Flag | What It Means | | -------------------------------- | -------------------------------------------------------------------------------------------------------------- | | **Missing Inverted Index** | A column used in an equality or IN predicate has no inverted index, requiring a full scan of each segment | | **Missing Range Index** | A column used in a range comparison (`>`, `<`, `BETWEEN`) has no range index | | **Missing Sorted Index** | A high-selectivity column could reduce segment pruning with a sorted index | | **Missing Bloom Filter** | A low-cardinality column with equality predicates could benefit from a bloom filter to skip segments quickly | | **Missing Text Index** | A column used with `LIKE` or text search has no full-text index | | **Missing FST Index** | A column used with regex predicates has no FST (Finite State Transducer) index | | **Missing JSON Index** | A column with JSON path access has no JSON index | | **Missing Star-Tree Index** | An aggregation pattern could be pre-computed with a star-tree index, potentially eliminating the scan entirely | | **Missing Timestamp Index** | A time column used in time-bucketing functions has no timestamp index | | **Missing Geospatial Index** | A column used in spatial predicates has no geospatial index | | **Missing Sparse Index** | A column stored in cold/tiered storage could benefit from a sparse index to reduce remote reads | | **Missing Composite JSON Index** | Multiple JSON path accesses could be accelerated with a composite JSON index | | **Missing Vector Index** | A column used in vector similarity search has no vector index | | **Full Scan Risk** | The query structure is likely to cause a full segment scan regardless of indexes | ### Query Pattern Issues These flags indicate structural patterns in the SQL or query plan that are likely to cause poor performance. | Risk Flag | What It Means | | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **Wrong Join Order** | The larger table appears on the right (hash build) side of a join. Pinot builds the in-memory hash table from the right input — putting the larger table there wastes memory and slows the build | | **Filter Not Pushed Down** | A filter appears above a join in the execution plan instead of below it. Pushing filters down reduces the number of rows that reach the join | | **Broadcast Join** | The right-side input of a join is broadcast to every worker, which is expensive for large tables | | **Self Join** | A table is joined with itself, which may indicate a query design issue | | **Cross Join Risk** | A join condition is non-equality, which can produce a Cartesian product | | **Offset Pagination** | `OFFSET` is used for pagination. For large offsets this requires scanning and discarding thousands of rows — use keyset (cursor) pagination instead | | **Select Star** | `SELECT *` fetches all columns including potentially large or unused ones | | **Leading Wildcard** | A `LIKE '%...'` pattern cannot use index prefix optimization and requires scanning every value | | **High Result Limit** | A very large `LIMIT` value may produce excessive result sets | | **Unnecessary Sort** | An `ORDER BY` clause has no corresponding `LIMIT`, meaning all rows are sorted | | **High Cardinality Group By** | `GROUP BY` on a high-cardinality column produces a large intermediate state that requires significant memory | | **Group Limit Risk** | The estimated number of groups may approach or exceed the default limit of 100,000 groups | | **Heavy Aggregation** | A memory-intensive aggregate function such as `DISTINCTCOUNT` is used | | **Suboptimal Encoding** | A column uses raw encoding where dictionary encoding would improve index effectiveness | ### Filter Propagation Issues These flags identify missed opportunities to reduce data early by pushing filters across the query plan. | Risk Flag | What It Means | | ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | **Missing Cross-Table Filter** | A filter applied to one side of a join could also be applied to the other side, but isn't — the engine ends up joining more rows than necessary | | **Large Table Without Pre-filter** | A large table is scanned with no pre-filter, causing a full scan before the join or aggregation that follows | | **Missing Date Range Propagation** | A date or time range filter on one side of a join is not propagated to the other side, missing an opportunity to prune data before the join executes | ### Diagnostic Values These values may appear in the `type` field of an API response but are not user-facing risk flags. They are used internally to drive post-processing logic and short-circuit gates. | Value | What It Means | | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `QUERY_FAILED` | The query did not execute successfully. When present, the analyzer may suppress or adjust downstream recommendations that depend on execution data. | | `STAR_TREE_FAST_PATH_FIRED` | A positive signal — an existing star-tree index's metadata fast-path served the query without a full scan. Used to trigger the "already optimal" short-circuit. | *** ## Operator Types Operator types appear within risk flags during runtime analysis. The **API value** column shows the exact string returned in the `operator` field of a `Finding` object. | Operator | API Value | What It Does | Key Metrics to Watch | | ------------------- | ----------------- | ----------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | | **Table Scan** | `LEAF` | Scans segments and applies filters | Entries scanned in-filter (high = indexes not being used); entries scanned post-filter (high = star-tree opportunity) | | **Hash Join** | `HASH_JOIN` | Builds a hash table from the right input and probes it with the left | Hash table build time; memory allocated; rows emitted (high rows = join explosion) | | **Data Send** | `MAILBOX_SEND` | Sends rows to downstream workers across stages | Bytes serialized; fan-out (number of workers data is sent to) | | **Data Receive** | `MAILBOX_RECEIVE` | Receives rows from upstream workers | Bytes deserialized; upstream wait time (high = upstream stage is slow) | | **Aggregation** | `AGGREGATE` | Groups rows and computes aggregate functions | Memory allocated; groups emitted; whether the group limit was hit | | **Window Function** | `WINDOW` | Computes window functions over a partition | Execution time; memory allocated (high = large partitions or wide frames) | | **Sort** | `SORT` | Sorts rows, typically in combination with a LIMIT | Execution time and memory for sort buffer | | **Filter** | `FILTER` | Applies a predicate to filter rows mid-plan | Execution time; rows emitted (high = filter is not selective enough) | | **Transform** | `TRANSFORM` | Applies column transformations or expression evaluation | Execution time; memory allocated (high = expensive expressions or many columns) | | **Union** | `UNION` | Combines row sets from multiple inputs (`UNION ALL`) | Execution time; total rows emitted across all inputs | | **Intersect** | `INTERSECT` | Returns rows present in all inputs (`INTERSECT`) | Execution time; memory allocated for set tracking | | **Minus** | `MINUS` | Returns rows from the first input not present in subsequent inputs (`EXCEPT`) | Execution time; memory allocated for set tracking | *** ## Evidence Types Evidence is embedded within each risk flag to show what triggered it. The type tells you where the finding came from. ### Found in Both Modes | Evidence Type | What It Means | | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Full Scan** | A full segment scan was detected in the explain plan | | **Index Used** | A specific index was used — for example, an inverted index for an equality predicate | | **Join Order** | Join order analysis based on table row counts | | **Missing Index** | A column is missing an index that would help — covers inverted, range, text, FST, JSON, bloom filter, star-tree, timestamp, geospatial, sparse, composite JSON, and vector index types | | **Filter-Only Join** | The right side of a join is used only to filter rows, making it a candidate for rewriting as `WHERE col IN (SELECT ...)` | | **Star-Tree Blocked** | A star-tree index cannot be used because the aggregation depends on a cross-table join | | **Column Cardinality** | Cardinality data from table statistics, used to assess index recommendations | | **Predicate Pattern** | Analysis of the query's filter structure | | **Suboptimal Encoding** | Column encoding is suboptimal for the query's access pattern | ### Found During Static Analysis Only | Evidence Type | What It Means | | ------------------ | ------------------------------------------------------------------------- | | **Plan Structure** | A structural observation from the explain plan | | **Exchange Type** | The data distribution pattern between stages — hash, broadcast, or random | ### Found During Runtime Analysis Only | Evidence Type | What It Means | | ------------------------- | ------------------------------------------------------------------------------------ | | **High Post-Filter Scan** | Many entries were scanned after the filter, suggesting a star-tree index opportunity | | **Join Build Time** | High hash table build time indicates the right-side table is too large | | **Join Explosion** | Large number of output rows from the join, indicating a many-to-many relationship | | **Group Limit Hit** | The 100,000 group limit was reached — results may be incomplete | | **High Memory** | An operator allocated significant heap memory | | **Heavy Shuffle** | Large volumes of data were sent between workers | | **Upstream Wait** | A stage spent significant time waiting for its upstream stage to produce data | | **Downstream Wait** | A stage was blocked waiting for the downstream consumer to process rows | | **Broadcast Detected** | A broadcast exchange was detected in the execution tree | *** ## Recommendation Types Recommendations appear in both static and runtime analysis. ### Query Changes | Type | What Gets Changed | | ----------------- | -------------------------------------------------------------------------------------------------------------------------- | | **Query Rewrite** | The SQL itself — for example, reordering joins, replacing OFFSET with keyset pagination, or rewriting a join as a subquery | | **Hint Addition** | A query hint added inline to control join strategy, aggregation behavior, or partition handling | ### Index Additions | Type | Index Added | | ---------------------------- | ------------------------------------------------------------------------- | | **Add Inverted Index** | Speeds up equality and IN predicates on the target column | | **Add Range Index** | Speeds up range comparisons on the target column | | **Add Bloom Filter** | Allows fast segment skipping for equality predicates on the target column | | **Add Text Index** | Enables full-text search and speeds up LIKE predicates | | **Add FST Index** | Enables fast regex matching on the target column | | **Add JSON Index** | Speeds up JSON path access on the target column | | **Add Star-Tree Index** | Pre-computes aggregations to potentially eliminate scanning entirely | | **Add Timestamp Index** | Speeds up time-bucketing functions on datetime columns | | **Add Geospatial Index** | Speeds up spatial predicate evaluation | | **Add Sparse Index** | Reduces remote reads for columns in cold/tiered storage | | **Add Composite JSON Index** | Optimizes multiple JSON path accesses together | | **Add Vector Index** | Enables efficient vector similarity search | ### Configuration Changes | Type | What Gets Changed | | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Encoding Change** | Column encoding type — for example, switching from raw to dictionary encoding | | **Join Strategy** | How the join is executed — for example, using a lookup join for dimension tables or a colocated join to eliminate data shuffling | | **Aggregation Hint** | Aggregation behavior — for example, enabling group trimming for ORDER BY + LIMIT patterns, or increasing the group count limit | | **Repartition Data** | Set or align `segmentPartitionConfig` and repartition existing data. The config change takes effect on new segments immediately; existing segments require a segment refresh or re-ingest to physically repartition. | | **Table Rebalance** | Trigger a controller-level table rebalance to redistribute segments across servers. No config diff is required — the rebalance operation alone addresses the imbalance. | | **Cluster Scale** | Scale the cluster up or out — add server nodes, increase memory, or add replicas. Recommended only as a last resort for sustained resource saturation (OOM, repeated timeouts) that no query rewrite or configuration change can resolve. | | **Scope Reduction** | Reduce what the query asks for — for example, narrow the time window, add a selective pre-filter, drop high-cardinality GROUP BY columns, or accept approximate aggregates. **This changes the result set by design.** Always a secondary recommendation on resource-exhaustion cases (group-limit hit, OOM, timeout) where the volume of data demanded is itself the problem. Never ranked above an equivalence-preserving fix, and always flagged as result-altering in the suggested change. | ### Sentinel | Type | What It Means | | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **No Optimization Needed** | The query is already optimal — for example, a trivial `SELECT * FROM t LIMIT n` preview. No suggested change is produced; the recommendation carries only a title and description confirming the query needs no changes. |