Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.startree.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

  By default, Pinot’s multi-stage query engine (MSE) executes all intermediate stages (joins, window functions, aggregations, sorts, unions, subqueries etc.) on server instances. This new StarTree Cloud feature allows brokers to also serve as workers for these intermediate stages, distributing compute-heavy operations across a larger pool of machines.

Why Use This

Reduce server load

  Intermediate stages like joins, sorts, and aggregations can be CPU- and memory-intensive. These operations compete for resources with leaf-stage work (segment scanning, filtering, indexing) on servers. Offloading intermediate stages to brokers frees server resources to focus on what they do best - serving data from disk.

Scale intermediate stages independently of data

  Servers are stateful - they hold segment data on disk and must be carefully scaled, rebalanced, and provisioned around storage requirements. Adding server capacity means rebalancing segments across the new nodes, which is a heavyweight operation. Brokers, on the other hand, are stateless. They hold no segment data and can be added or removed freely with no rebalancing or data movement. This makes brokers a natural fit for horizontally scaling intermediate-stage compute: if your complex queries (joins, large aggregations, multi-table unions) are bottlenecked on intermediate-stage resources, you can spin up more brokers without touching your server fleet.

Right-size instance types for the workload

  In cloud deployments, servers are typically provisioned with large disks and balanced CPU/memory for segment storage and scanning. Intermediate stages have a very different resource profile - they are often memory- and CPU-bound (hash joins, sort-based aggregations, large intermediate result sets) with no disk requirements. With this feature, you can run your intermediate-stage broker pool on memory-optimized instances (e.g., AWS r-family, GCP m-family) that are purpose-built for this workload, rather than paying for disk-heavy server instances that carry unused storage capacity. This can lead to both better query performance and lower infrastructure cost.

Workload isolation

  Using broker tags, you can dedicate specific broker pools to MSE intermediate work. This prevents heavy analytical queries from starving latency-sensitive queries of resources, and lets you independently autoscale the broker pool handling complex workloads.

Enabling the Feature

Add the following to your broker / cluster configuration:
startree.broker.mse.use.broker.for.intermediate.stage=true
  No server-side changes are required. Each broker with this config enabled will start an embedded query server that can accept dispatched intermediate stage plans from other brokers.

Configuration Reference

Broker Configs

Config KeyDefaultDescription
startree.broker.mse.use.broker.for.intermediate.stagefalseEnable using tagged brokers as workers for MSE intermediate stages.
startree.broker.mse.default.worker.broker.tagDefaultTenant_BROKERHelix tag identifying which brokers serve as intermediate-stage workers.
startree.broker.mse.num.brokers.for.intermediate.stage-1 (all)Max number of tagged brokers to use for intermediate stages; -1 means use all available

Query Options (per-query overrides)

These can be set via SET statements to override cluster defaults on a per-query basis.
Query OptionDescription
useBrokerForIntermediateStageOverride the cluster-level broker-as-worker setting for this query (true/false)
mseWorkerBrokerTagTarget a specific broker pool by tag for this query’s intermediate stages.
numBrokersForIntermediateStageOverride the max broker count for this query’s intermediate stages. Default = all brokers in the given tenant.

Example

  SET useBrokerForIntermediateStage = true;
  SET mseWorkerBrokerTag = 'highMem_BROKER';
  SET numBrokersForIntermediateStage = 4;
  SELECT a.col1, b.col2
  FROM tableA a JOIN tableB b ON a.id = b.id

Workload Isolation with Broker Tags

  You can partition brokers into pools using Helix tags and route intermediate stage work to a specific pool:   1. Tag a subset of brokers with a custom tag (e.g., heavyQuery_BROKER).   2. Set the default tag in broker config:
  startree.broker.mse.default.worker.broker.tag=heavyQuery_BROKER
  3. Or override per query:
SET mseWorkerBrokerTag = 'heavyQuery_BROKER';

  SELECT ...
Only brokers carrying the specified tag (and that are live, enabled, and have the feature turned on) will be selected as intermediate-stage workers.

Limiting Broker Parallelism

  By default, all brokers matching the tag are used. To cap the number of brokers used for a query’s intermediate stages:
  startree.broker.mse.num.brokers.for.intermediate.stage=4
  Or per query:
  SET numBrokersForIntermediateStage = 2;

  SELECT ...
This is useful when you want to limit resource consumption for individual queries or prevent a single query from monopolizing the entire broker pool.