The Problem
Optimizing Pinot multi-stage queries is hard because the debugging surface is fragmented and expert-only:- Explain plans are detailed but difficult to interpret for most users
- Stage statistics show where time was spent, but not what to change
- Table metadata (indexes, cardinality, data types) is spread across multiple pages
- Users end up guessing, iterating blindly, or escalating to support
How to Access
Query Analyzer is available directly in the StarTree Data Portal Query Console. Once enabled for your environment, no additional configuration is required — write your SQL query and use the built-in Analyze options to get instant, AI-generated optimization guidance.Navigate to Data Portal → Query Console, write your query, and select Analyze to get started.
What You Get Back
Query Analyzer returns three top-level sections for every analysis:| Output | Description |
|---|---|
| Query Summary | Plain-English description of what the query does |
| Risk Flags | Issues found, ranked by severity. Each flag carries its own supporting evidence, bottleneck metrics (runtime), and any correctness alerts — so context and cause are always together |
| Recommendations | Ranked, actionable suggestions with expected impact, a before/after change, and tradeoffs |
Two Analysis Modes
Static Analysis
Run before executing your query. Flags likely problems using table metadata, index configuration, and the explain plan — so you can fix issues before spending compute.
Runtime Analysis
Run after executing your query. Uses real execution statistics to pinpoint actual bottlenecks and provide evidence-based fixes.
What Gets Analyzed
Index gaps Missing inverted, range, sorted, bloom filter, text, FST, JSON, star-tree, timestamp, geospatial, sparse, composite JSON, and vector indexes, as well as full scan risk. Query patterns Wrong join order, filters not pushed below joins, broadcast join risk, cross-join risk, OFFSET-based pagination,SELECT *, leading-wildcard LIKE, high result LIMIT, unnecessary sort, high-cardinality GROUP BY, group limit risk, heavy aggregation, suboptimal column encoding, missing cross-table filter, large table without pre-filter, and missing date range propagation.
Operator-level execution (runtime only)
Each risk flag produced during runtime analysis includes measured metrics from the relevant operator — execution time, memory allocation, rows produced, bytes shuffled, and correctness signals such as whether the aggregation group limit was hit.
AI Model and Data Privacy
Query Analyzer uses a large language model (LLM) to interpret query context and generate recommendations. The model used depends on your cloud deployment:| Deployment | AI Provider |
|---|---|
| AWS | Amazon Bedrock |
| Google Cloud (Coming Soon) | Vertex AI |
| Azure (Coming Soon) | Azure AI Foundry |
What Is Sent to the Model
The following information is assembled into the prompt and sent to the AI model:| Data | Source | Notes |
|---|---|---|
| SQL query text | User input | The query you submit for analysis |
| Table config and schema | Pinot metadata | Index settings, field types, encoding — no row data |
| Column cardinality estimates | Pinot metadata | Approximate distinct value counts per column |
| Explain plan | Pinot | Query execution structure — no actual data values |
| Execution statistics | Pinot | Operator-level timing and memory metrics — no actual data values |
Guardrails
Query Analyzer is designed to produce high-signal, actionable output:- Every recommendation includes specific evidence from the query plan or execution statistics
- Low-confidence and low-impact findings are suppressed — only high-signal results are shown
- Correctness alerts (such as results being potentially incomplete due to group limits) are surfaced prominently
- Query rewrite suggestions are validated against Pinot SQL syntax rules
- All suggestions include tradeoffs so you can make informed decisions
Next Steps
How to Use Query Analyzer
Step-by-step guide to running static and runtime analysis from the Query Console, and how to act on the results.
Understanding Your Results
Reference guide to every output field, risk flag type, recommendation category, and what each means for your query.

