Overview

By default, Pinot’s multi-stage query engine (MSE) executes all intermediate stages (joins, window functions, aggregations, sorts, unions, subqueries etc.) on server instances. This new StarTree Cloud feature allows brokers to also serve as workers for these intermediate stages, distributing compute-heavy operations across a larger pool of machines.

Why Use This

Reduce server load

Intermediate stages like joins, sorts, and aggregations can be CPU- and memory-intensive. These operations compete for resources with leaf-stage work (segment scanning, filtering, indexing) on servers. Offloading intermediate stages to brokers frees server resources to focus on what they do best - serving data from disk.

Scale intermediate stages independently of data

Servers are stateful - they hold segment data on disk and must be carefully scaled, rebalanced, and provisioned around storage requirements. Adding server capacity means rebalancing segments across the new nodes, which is a heavyweight operation. Brokers, on the other hand, are stateless. They hold no segment data and can be added or removed freely with no rebalancing or data movement. This makes brokers a natural fit for horizontally scaling intermediate-stage compute: if your complex queries (joins, large aggregations, multi-table unions) are bottlenecked on intermediate-stage resources, you can spin up more brokers without touching your server fleet.

Right-size instance types for the workload

In cloud deployments, servers are typically provisioned with large disks and balanced CPU/memory for segment storage and scanning. Intermediate stages have a very different resource profile - they are often memory- and CPU-bound (hash joins, sort-based aggregations, large intermediate result sets) with no disk requirements. With this feature, you can run your intermediate-stage broker pool on memory-optimized instances (e.g., AWS r-family, GCP m-family) that are purpose-built for this workload, rather than paying for disk-heavy server instances that carry unused storage capacity. This can lead to both better query performance and lower infrastructure cost.

Workload isolation

Using broker tags, you can dedicate specific broker pools to MSE intermediate work. This prevents heavy analytical queries from starving latency-sensitive queries of resources, and lets you independently autoscale the broker pool handling complex workloads.

Enabling the Feature

Add the following to your broker / cluster configuration:

startree.broker.mse.use.broker.for.intermediate.stage=true

No server-side changes are required. Each broker with this config enabled will start an embedded query server that can accept dispatched intermediate stage plans from other brokers. Note: this property is mandatory and must be set to true for this feature to work. This property cannot be dynamically configured in a running cluster and needs a restart for the feature to work.

Configuration Reference

Broker Configs

Config Key	Default	Description
startree.broker.mse.use.broker.for.intermediate.stage	false	Mandatory. Enable using tagged brokers as workers for MSE intermediate stages.
startree.broker.mse.default.worker.broker.tag	DefaultTenant_BROKER	Helix tag identifying which brokers serve as intermediate-stage workers.
startree.broker.mse.num.brokers.for.intermediate.stage	-1 (all)	Max number of tagged brokers to use for intermediate stages; -1 means use all available

Query Options (per-query overrides)

These can be set via SET statements to override cluster defaults on a per-query basis.

Query Option	Description
useBrokerForIntermediateStage	Override the cluster-level broker-as-worker setting for this query (true/false). When set to true - this will be a no-op assuming the broker/cluster level property is set to true. When set to false, it will result in intermediate stages being executed on the servers (old behavior)
mseWorkerBrokerTag	Target a specific broker pool by tag for this query’s intermediate stages.
numBrokersForIntermediateStage	Override the max broker count for this query’s intermediate stages. Default = all brokers in the given tenant.

Example

  SET useBrokerForIntermediateStage = true;
  SET mseWorkerBrokerTag = 'highMem_BROKER';
  SET numBrokersForIntermediateStage = 4;
  SELECT a.col1, b.col2
  FROM tableA a JOIN tableB b ON a.id = b.id

Workload Isolation with Broker Tags

You can partition brokers into pools using Helix tags and route intermediate stage work to a specific pool: 1. Tag a subset of brokers with a custom tag (e.g., heavyQuery_BROKER). 2. Set the default tag in broker config:

  startree.broker.mse.default.worker.broker.tag=heavyQuery_BROKER

3. Or override per query:

SET mseWorkerBrokerTag = 'heavyQuery_BROKER';

  SELECT ...

Only brokers carrying the specified tag (and that are live, enabled, and have the feature turned on) will be selected as intermediate-stage workers.

Limiting Broker Parallelism

By default, all brokers matching the tag are used. To cap the number of brokers used for a query’s intermediate stages:

  startree.broker.mse.num.brokers.for.intermediate.stage=4

Or per query:

  SET numBrokersForIntermediateStage = 2;

  SELECT ...

This is useful when you want to limit resource consumption for individual queries or prevent a single query from monopolizing the entire broker pool.

​Overview

​Why Use This

​Reduce server load

​Scale intermediate stages independently of data

​Right-size instance types for the workload

​Workload isolation

​Enabling the Feature

​Configuration Reference

​Broker Configs

​Query Options (per-query overrides)

​Example

​Workload Isolation with Broker Tags

​Limiting Broker Parallelism