> ## Documentation Index
> Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Understanding Recommendations

<Warning>
  **Beta feature** — Query Analyzer is disabled by default and available on demand. Contact your StarTree account team to have it enabled for your environment.
</Warning>

This page is a reference for everything Query Analyzer returns. Use it to understand what each field means, what triggered a specific finding, and what the different recommendation and risk flag categories represent.

## Output Sections

### Risk Flags

The main output section. Each risk flag represents one detected issue and bundles everything related to that issue in one place: what was found, why it matters, the measured evidence behind it, and — for runtime analysis — the operator metrics and any correctness concerns.

| Field                    | Description                                                                                                                                  |
| ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
| **Type**                 | The category of issue detected (see [Risk Flag Types](#risk-flag-types) below)                                                               |
| **Severity**             | How serious the issue is — `HIGH` or `MEDIUM`                                                                                                |
| **Description**          | A plain-English explanation of what was found and why it matters                                                                             |
| **Evidence**             | The specific data that triggered the flag — for example, a column name, cardinality value, index configuration, or measured execution metric |
| **Confidence**           | How confident the analyzer is in the finding — `HIGH` or `MEDIUM`                                                                            |
| **Operator** *(runtime)* | The execution operator this flag relates to — for example, Hash Join or Table Scan (see [Operator Types](#operator-types))                   |
| **Metrics** *(runtime)*  | Measured execution values for the operator — time, memory, rows, bytes shuffled                                                              |

Only `HIGH` and `MEDIUM` confidence findings are shown. Low-confidence results are suppressed to keep output focused.

<Warning>
  When a risk flag carries a correctness alert, address it before acting on performance recommendations. A correctness issue — such as the group limit being reached — means query results may already be incomplete, which affects how you should interpret the rest of the analysis.
</Warning>

### Recommendations

Ranked, actionable suggestions. Each recommendation includes:

| Field                 | Description                                                                                  |
| --------------------- | -------------------------------------------------------------------------------------------- |
| **Rank**              | Priority order — rank 1 has the highest expected impact                                      |
| **Title**             | Short description of the action to take                                                      |
| **Type**              | The category of change recommended (see [Recommendation Types](#recommendation-types) below) |
| **Description**       | Detailed explanation of why this change helps                                                |
| **Suggested Change**  | A before/after view of the specific change — a query snippet, index config, or hint          |
| **Expected Impact**   | How much improvement is expected — `HIGH` or `MEDIUM`                                        |
| **Confidence**        | How confident the analyzer is in this recommendation — `HIGH` or `MEDIUM`                    |
| **Evidence**          | References to the supporting evidence items that justify this recommendation                 |
| **Tradeoffs**         | Side effects or caveats to consider before applying the change                               |
| **Correctness Notes** | Semantic considerations — for example, whether a join reorder affects NULL handling          |

Recommendations are ranked by expected impact first, then confidence. A query rewrite that eliminates an expensive operation ranks above an index addition at the same impact level.

***

## Risk Flag Types

Risk flags are detected during both **static and runtime analysis**. There are two categories.

### Index Gaps

These flags indicate that a column used in a filter, join, or aggregation is missing an index that would significantly speed up the query.

| Risk Flag                        | What It Means                                                                                                  |
| -------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| **Missing Inverted Index**       | A column used in an equality or IN predicate has no inverted index, requiring a full scan of each segment      |
| **Missing Range Index**          | A column used in a range comparison (`>`, `<`, `BETWEEN`) has no range index                                   |
| **Missing Sorted Index**         | A high-selectivity column could reduce segment pruning with a sorted index                                     |
| **Missing Bloom Filter**         | A low-cardinality column with equality predicates could benefit from a bloom filter to skip segments quickly   |
| **Missing Text Index**           | A column used with `LIKE` or text search has no full-text index                                                |
| **Missing FST Index**            | A column used with regex predicates has no FST (Finite State Transducer) index                                 |
| **Missing JSON Index**           | A column with JSON path access has no JSON index                                                               |
| **Missing Star-Tree Index**      | An aggregation pattern could be pre-computed with a star-tree index, potentially eliminating the scan entirely |
| **Missing Timestamp Index**      | A time column used in time-bucketing functions has no timestamp index                                          |
| **Missing Geospatial Index**     | A column used in spatial predicates has no geospatial index                                                    |
| **Missing Sparse Index**         | A column stored in cold/tiered storage could benefit from a sparse index to reduce remote reads                |
| **Missing Composite JSON Index** | Multiple JSON path accesses could be accelerated with a composite JSON index                                   |
| **Missing Vector Index**         | A column used in vector similarity search has no vector index                                                  |
| **Full Scan Risk**               | The query structure is likely to cause a full segment scan regardless of indexes                               |

### Query Pattern Issues

These flags indicate structural patterns in the SQL or query plan that are likely to cause poor performance.

| Risk Flag                     | What It Means                                                                                                                                                                                    |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Wrong Join Order**          | The larger table appears on the right (hash build) side of a join. Pinot builds the in-memory hash table from the right input — putting the larger table there wastes memory and slows the build |
| **Filter Not Pushed Down**    | A filter appears above a join in the execution plan instead of below it. Pushing filters down reduces the number of rows that reach the join                                                     |
| **Broadcast Join**            | The right-side input of a join is broadcast to every worker, which is expensive for large tables                                                                                                 |
| **Self Join**                 | A table is joined with itself, which may indicate a query design issue                                                                                                                           |
| **Cross Join Risk**           | A join condition is non-equality, which can produce a Cartesian product                                                                                                                          |
| **Offset Pagination**         | `OFFSET` is used for pagination. For large offsets this requires scanning and discarding thousands of rows — use keyset (cursor) pagination instead                                              |
| **Select Star**               | `SELECT *` fetches all columns including potentially large or unused ones                                                                                                                        |
| **Leading Wildcard**          | A `LIKE '%...'` pattern cannot use index prefix optimization and requires scanning every value                                                                                                   |
| **High Result Limit**         | A very large `LIMIT` value may produce excessive result sets                                                                                                                                     |
| **Unnecessary Sort**          | An `ORDER BY` clause has no corresponding `LIMIT`, meaning all rows are sorted                                                                                                                   |
| **High Cardinality Group By** | `GROUP BY` on a high-cardinality column produces a large intermediate state that requires significant memory                                                                                     |
| **Group Limit Risk**          | The estimated number of groups may approach or exceed the default limit of 100,000 groups                                                                                                        |
| **Heavy Aggregation**         | A memory-intensive aggregate function such as `DISTINCTCOUNT` is used                                                                                                                            |
| **Suboptimal Encoding**       | A column uses raw encoding where dictionary encoding would improve index effectiveness                                                                                                           |

### Filter Propagation Issues

These flags identify missed opportunities to reduce data early by pushing filters across the query plan.

| Risk Flag                          | What It Means                                                                                                                                        |
| ---------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Missing Cross-Table Filter**     | A filter applied to one side of a join could also be applied to the other side, but isn't — the engine ends up joining more rows than necessary      |
| **Large Table Without Pre-filter** | A large table is scanned with no pre-filter, causing a full scan before the join or aggregation that follows                                         |
| **Missing Date Range Propagation** | A date or time range filter on one side of a join is not propagated to the other side, missing an opportunity to prune data before the join executes |

### Diagnostic Values

These values may appear in the `type` field of an API response but are not user-facing risk flags. They are used internally to drive post-processing logic and short-circuit gates.

| Value                       | What It Means                                                                                                                                                   |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `QUERY_FAILED`              | The query did not execute successfully. When present, the analyzer may suppress or adjust downstream recommendations that depend on execution data.             |
| `STAR_TREE_FAST_PATH_FIRED` | A positive signal — an existing star-tree index's metadata fast-path served the query without a full scan. Used to trigger the "already optimal" short-circuit. |

***

## Operator Types

Operator types appear within risk flags during runtime analysis. The **API value** column shows the exact string returned in the `operator` field of a `Finding` object.

| Operator            | API Value         | What It Does                                                                  | Key Metrics to Watch                                                                                                  |
| ------------------- | ----------------- | ----------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
| **Table Scan**      | `LEAF`            | Scans segments and applies filters                                            | Entries scanned in-filter (high = indexes not being used); entries scanned post-filter (high = star-tree opportunity) |
| **Hash Join**       | `HASH_JOIN`       | Builds a hash table from the right input and probes it with the left          | Hash table build time; memory allocated; rows emitted (high rows = join explosion)                                    |
| **Data Send**       | `MAILBOX_SEND`    | Sends rows to downstream workers across stages                                | Bytes serialized; fan-out (number of workers data is sent to)                                                         |
| **Data Receive**    | `MAILBOX_RECEIVE` | Receives rows from upstream workers                                           | Bytes deserialized; upstream wait time (high = upstream stage is slow)                                                |
| **Aggregation**     | `AGGREGATE`       | Groups rows and computes aggregate functions                                  | Memory allocated; groups emitted; whether the group limit was hit                                                     |
| **Window Function** | `WINDOW`          | Computes window functions over a partition                                    | Execution time; memory allocated (high = large partitions or wide frames)                                             |
| **Sort**            | `SORT`            | Sorts rows, typically in combination with a LIMIT                             | Execution time and memory for sort buffer                                                                             |
| **Filter**          | `FILTER`          | Applies a predicate to filter rows mid-plan                                   | Execution time; rows emitted (high = filter is not selective enough)                                                  |
| **Transform**       | `TRANSFORM`       | Applies column transformations or expression evaluation                       | Execution time; memory allocated (high = expensive expressions or many columns)                                       |
| **Union**           | `UNION`           | Combines row sets from multiple inputs (`UNION ALL`)                          | Execution time; total rows emitted across all inputs                                                                  |
| **Intersect**       | `INTERSECT`       | Returns rows present in all inputs (`INTERSECT`)                              | Execution time; memory allocated for set tracking                                                                     |
| **Minus**           | `MINUS`           | Returns rows from the first input not present in subsequent inputs (`EXCEPT`) | Execution time; memory allocated for set tracking                                                                     |

***

## Evidence Types

Evidence is embedded within each risk flag to show what triggered it. The type tells you where the finding came from.

### Found in Both Modes

| Evidence Type           | What It Means                                                                                                                                                                          |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Full Scan**           | A full segment scan was detected in the explain plan                                                                                                                                   |
| **Index Used**          | A specific index was used — for example, an inverted index for an equality predicate                                                                                                   |
| **Join Order**          | Join order analysis based on table row counts                                                                                                                                          |
| **Missing Index**       | A column is missing an index that would help — covers inverted, range, text, FST, JSON, bloom filter, star-tree, timestamp, geospatial, sparse, composite JSON, and vector index types |
| **Filter-Only Join**    | The right side of a join is used only to filter rows, making it a candidate for rewriting as `WHERE col IN (SELECT ...)`                                                               |
| **Star-Tree Blocked**   | A star-tree index cannot be used because the aggregation depends on a cross-table join                                                                                                 |
| **Column Cardinality**  | Cardinality data from table statistics, used to assess index recommendations                                                                                                           |
| **Predicate Pattern**   | Analysis of the query's filter structure                                                                                                                                               |
| **Suboptimal Encoding** | Column encoding is suboptimal for the query's access pattern                                                                                                                           |

### Found During Static Analysis Only

| Evidence Type      | What It Means                                                             |
| ------------------ | ------------------------------------------------------------------------- |
| **Plan Structure** | A structural observation from the explain plan                            |
| **Exchange Type**  | The data distribution pattern between stages — hash, broadcast, or random |

### Found During Runtime Analysis Only

| Evidence Type             | What It Means                                                                        |
| ------------------------- | ------------------------------------------------------------------------------------ |
| **High Post-Filter Scan** | Many entries were scanned after the filter, suggesting a star-tree index opportunity |
| **Join Build Time**       | High hash table build time indicates the right-side table is too large               |
| **Join Explosion**        | Large number of output rows from the join, indicating a many-to-many relationship    |
| **Group Limit Hit**       | The 100,000 group limit was reached — results may be incomplete                      |
| **High Memory**           | An operator allocated significant heap memory                                        |
| **Heavy Shuffle**         | Large volumes of data were sent between workers                                      |
| **Upstream Wait**         | A stage spent significant time waiting for its upstream stage to produce data        |
| **Downstream Wait**       | A stage was blocked waiting for the downstream consumer to process rows              |
| **Broadcast Detected**    | A broadcast exchange was detected in the execution tree                              |

***

## Recommendation Types

Recommendations appear in both static and runtime analysis.

### Query Changes

| Type              | What Gets Changed                                                                                                          |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------- |
| **Query Rewrite** | The SQL itself — for example, reordering joins, replacing OFFSET with keyset pagination, or rewriting a join as a subquery |
| **Hint Addition** | A query hint added inline to control join strategy, aggregation behavior, or partition handling                            |

### Index Additions

| Type                         | Index Added                                                               |
| ---------------------------- | ------------------------------------------------------------------------- |
| **Add Inverted Index**       | Speeds up equality and IN predicates on the target column                 |
| **Add Range Index**          | Speeds up range comparisons on the target column                          |
| **Add Bloom Filter**         | Allows fast segment skipping for equality predicates on the target column |
| **Add Text Index**           | Enables full-text search and speeds up LIKE predicates                    |
| **Add FST Index**            | Enables fast regex matching on the target column                          |
| **Add JSON Index**           | Speeds up JSON path access on the target column                           |
| **Add Star-Tree Index**      | Pre-computes aggregations to potentially eliminate scanning entirely      |
| **Add Timestamp Index**      | Speeds up time-bucketing functions on datetime columns                    |
| **Add Geospatial Index**     | Speeds up spatial predicate evaluation                                    |
| **Add Sparse Index**         | Reduces remote reads for columns in cold/tiered storage                   |
| **Add Composite JSON Index** | Optimizes multiple JSON path accesses together                            |
| **Add Vector Index**         | Enables efficient vector similarity search                                |

### Configuration Changes

| Type                 | What Gets Changed                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Encoding Change**  | Column encoding type — for example, switching from raw to dictionary encoding                                                                                                                                                                                                                                                                                                                                                                                                                   |
| **Join Strategy**    | How the join is executed — for example, using a lookup join for dimension tables or a colocated join to eliminate data shuffling                                                                                                                                                                                                                                                                                                                                                                |
| **Aggregation Hint** | Aggregation behavior — for example, enabling group trimming for ORDER BY + LIMIT patterns, or increasing the group count limit                                                                                                                                                                                                                                                                                                                                                                  |
| **Repartition Data** | Set or align `segmentPartitionConfig` and repartition existing data. The config change takes effect on new segments immediately; existing segments require a segment refresh or re-ingest to physically repartition.                                                                                                                                                                                                                                                                            |
| **Table Rebalance**  | Trigger a controller-level table rebalance to redistribute segments across servers. No config diff is required — the rebalance operation alone addresses the imbalance.                                                                                                                                                                                                                                                                                                                         |
| **Cluster Scale**    | Scale the cluster up or out — add server nodes, increase memory, or add replicas. Recommended only as a last resort for sustained resource saturation (OOM, repeated timeouts) that no query rewrite or configuration change can resolve.                                                                                                                                                                                                                                                       |
| **Scope Reduction**  | Reduce what the query asks for — for example, narrow the time window, add a selective pre-filter, drop high-cardinality GROUP BY columns, or accept approximate aggregates. **This changes the result set by design.** Always a secondary recommendation on resource-exhaustion cases (group-limit hit, OOM, timeout) where the volume of data demanded is itself the problem. Never ranked above an equivalence-preserving fix, and always flagged as result-altering in the suggested change. |

### Sentinel

| Type                       | What It Means                                                                                                                                                                                                            |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **No Optimization Needed** | The query is already optimal — for example, a trivial `SELECT * FROM t LIMIT n` preview. No suggested change is produced; the recommendation carries only a title and description confirming the query needs no changes. |
