Skip to main content
Beta feature — Query Analyzer is disabled by default and available on demand. Contact your StarTree account team to have it enabled for your environment.
Query Analyzer is an AI-powered tool built into StarTree Cloud that helps you understand and optimize Apache Pinot multi-stage engine (MSE) queries. It analyzes your SQL query alongside table metadata, explain plans, and execution statistics to produce prioritized, evidence-backed recommendations — all without requiring deep Pinot expertise.

The Problem

Optimizing Pinot multi-stage queries is hard because the debugging surface is fragmented and expert-only:
  • Explain plans are detailed but difficult to interpret for most users
  • Stage statistics show where time was spent, but not what to change
  • Table metadata (indexes, cardinality, data types) is spread across multiple pages
  • Users end up guessing, iterating blindly, or escalating to support
This leads to slow time-to-resolution, wasted compute, and repetitive support loops.

How to Access

Query Analyzer is available directly in the StarTree Data Portal Query Console. Once enabled for your environment, no additional configuration is required — write your SQL query and use the built-in Analyze options to get instant, AI-generated optimization guidance.
Navigate to Data Portal → Query Console, write your query, and select Analyze to get started.

What You Get Back

Query Analyzer returns three top-level sections for every analysis:
OutputDescription
Query SummaryPlain-English description of what the query does
Risk FlagsIssues found, ranked by severity. Each flag carries its own supporting evidence, bottleneck metrics (runtime), and any correctness alerts — so context and cause are always together
RecommendationsRanked, actionable suggestions with expected impact, a before/after change, and tradeoffs
Every recommendation is grounded in measured evidence, not generic advice.

Two Analysis Modes

Static Analysis

Run before executing your query. Flags likely problems using table metadata, index configuration, and the explain plan — so you can fix issues before spending compute.

Runtime Analysis

Run after executing your query. Uses real execution statistics to pinpoint actual bottlenecks and provide evidence-based fixes.

What Gets Analyzed

Index gaps Missing inverted, range, sorted, bloom filter, text, FST, JSON, star-tree, timestamp, geospatial, sparse, composite JSON, and vector indexes, as well as full scan risk. Query patterns Wrong join order, filters not pushed below joins, broadcast join risk, cross-join risk, OFFSET-based pagination, SELECT *, leading-wildcard LIKE, high result LIMIT, unnecessary sort, high-cardinality GROUP BY, group limit risk, heavy aggregation, suboptimal column encoding, missing cross-table filter, large table without pre-filter, and missing date range propagation. Operator-level execution (runtime only) Each risk flag produced during runtime analysis includes measured metrics from the relevant operator — execution time, memory allocation, rows produced, bytes shuffled, and correctness signals such as whether the aggregation group limit was hit.

AI Model and Data Privacy

Query Analyzer uses a large language model (LLM) to interpret query context and generate recommendations. The model used depends on your cloud deployment:
DeploymentAI Provider
AWSAmazon Bedrock
Google Cloud (Coming Soon)Vertex AI
Azure (Coming Soon)Azure AI Foundry

What Is Sent to the Model

The following information is assembled into the prompt and sent to the AI model:
DataSourceNotes
SQL query textUser inputThe query you submit for analysis
Table config and schemaPinot metadataIndex settings, field types, encoding — no row data
Column cardinality estimatesPinot metadataApproximate distinct value counts per column
Explain planPinotQuery execution structure — no actual data values
Execution statisticsPinotOperator-level timing and memory metrics — no actual data values
No credentials, row data, column values, or personally identifiable information is sent to the AI model. Only structural metadata and execution statistics are included in the prompt.

Guardrails

Query Analyzer is designed to produce high-signal, actionable output:
  • Every recommendation includes specific evidence from the query plan or execution statistics
  • Low-confidence and low-impact findings are suppressed — only high-signal results are shown
  • Correctness alerts (such as results being potentially incomplete due to group limits) are surfaced prominently
  • Query rewrite suggestions are validated against Pinot SQL syntax rules
  • All suggestions include tradeoffs so you can make informed decisions

Next Steps

How to Use Query Analyzer

Step-by-step guide to running static and runtime analysis from the Query Console, and how to act on the results.

Understanding Your Results

Reference guide to every output field, risk flag type, recommendation category, and what each means for your query.