Throttling limits concurrency (how many operations run at once), not throughput. A limit of
4 means at most 4 segments are processed by that operation simultaneously on a given server;
the rest queue and run as slots free up.The throttlers
Each throttler guards one type of work. A segment operation may pass through several of them in sequence (for example, an OFFLINE→ONLINE transition can download, then preprocess indexes).| Throttler | What it protects | When it applies |
|---|---|---|
| Download | Concurrent segment downloads from the deep store | Any operation that needs to fetch a segment locally (restart, reset, rebalance, backfill) |
| Index preprocess (all) | Concurrent index building / preprocessing | Building or rebuilding indexes during segment load and reload |
| StarTree preprocess | StarTree index rebuilds (CPU- and memory-heavy) | Acquired while holding the index-preprocess slot, for segments with a StarTree index |
| Multi-column text index preprocess | Multi-column text index builds (heavy) | Acquired while holding the index-preprocess slot, for segments with a multi-column text index |
| RocksDB | Upsert / dedup metadata operations backed by RocksDB | Loading or preloading segments of an upsert or dedup table |
StarTree and multi-column text index preprocessing are acquired on top of the index-preprocess
throttler, so they are intentionally given small limits (default
1) — they are the most
expensive per segment.Context-aware isolation (consuming vs. general)
By default, throttlers are split into two isolated sets so that operations on consuming (realtime) segments are never blocked behind bulk background work:consuming— operations that involve consuming segments, such as committing a realtime segment. These keep ingestion healthy.general— everything else: server restart, table reset, rebalance, reload, refresh, and upsert/dedup preload.
general operation (a 10k-segment reset)
cannot exhaust the permits that consuming-segment commits rely on.
This isolation is controlled at the server level:
| Cluster config | Default | Description |
|---|---|---|
pinot.server.throttler.context.aware.enabled | true | Enable the consuming / general split. When false, all operations share one set. |
pinot.server.throttler.parallelism.ratio.consuming | 0.75 | Fraction of the configured parallelism allotted to the consuming set. |
pinot.server.throttler.parallelism.ratio.general | 0.75 | Fraction of the configured parallelism allotted to the general set. |
Server-level limits
Server-level limits are set as cluster configs and apply to every server in the cluster. Updating a cluster config takes effect immediately, without a restart.| Throttler | Cluster config | Default* |
|---|---|---|
| Download | pinot.server.max.segment.download.parallelism | max(1, cores/4) |
| Index preprocess (all) | pinot.server.max.segment.preprocess.parallelism | max(1, cores/4) |
| StarTree preprocess | pinot.server.max.segment.startree.preprocess.parallelism | 1 |
| Multi-col text index preprocess | pinot.server.max.segment.multicol.text.index.preprocess.parallelism | 1 |
| RocksDB (upsert/dedup) | pinot.server.max.segment.rocksdb.parallelism | max(1, cores/4) |
.before.serving.queries variant (for example,
pinot.server.max.segment.download.parallelism.before.serving.queries). This variant is used
only during server startup, before the server begins serving queries, and is set higher
(up to all cores) so a starting server loads its segments quickly. Once the server starts
serving queries, the steady-state limit above takes over to protect query latency.
Per-table tuning
To stop one table from monopolizing a server-level throttler, you can cap how many permits a single table may hold within a throttler. This is set in the table config, undercustomConfigs, using keys of the form:
<context>—generalorconsuming<throttlerType>—download,allIndexPreprocess,starTreePreprocess,multiColTextIndexPreprocess, orrocksDB
myTable, the table can use at most 2 of
the server’s index-preprocess permits and 2 download permits at a time, leaving headroom for
other tables. Table config changes are picked up live — no restart required.
Observability
Each throttler exposes server metrics so you can see saturation and tune limits with evidence:- Threshold — the currently configured permit count for the throttler.
- In-use count — how many permits are held right now. Sustained
in-use ≈ thresholdmeans the throttler is saturated and is the bottleneck. - Queue length — how many pending operations are waiting for this throttler from each table.
- Wait time / hold time / acquisition — how long operations wait to acquire a permit, how long they hold the throttler, and how often they acquire per table. High wait time confirms contention.
How to tune
Find the saturated throttler
On the Grafana segment-operations panels, look for a throttler whose in-use count sits at its
threshold with rising wait times during the operation you are running (restart, reset,
rebalance, reload, ingestion spike).
Decide where to apply the limit
If one table is responsible (for example, a large reset), add a per-table cap so other
tables keep making progress. If the whole server is overloaded, lower the server-level
limit for that throttler instead.
Adjust and observe
Change the cluster config or table config and watch the metrics — limits apply live. Lower a
limit to reduce impact on query latency; raise it to drain a backlog faster.

