Segment Operation Throttling

When a server loads segments — during a restart, a reset, a rebalance, a reload/refresh, or while committing realtime segments — it competes for a few shared, resource-intensive operations: downloading segments from the deep store, building indexes, and (for upsert/dedup tables) updating RocksDB metadata. Left unbounded, a single heavy operation (for example, resetting a table with tens of thousands of segments) can saturate these resources and slow down query serving or block other tables. Segment operation throttling puts a configurable concurrency limit on each of these operations. Limits are enforced per server and can be tuned live — you do not need to restart the server to change them.

Throttling limits concurrency (how many operations run at once), not throughput. A limit of 4 means at most 4 segments are processed by that operation simultaneously on a given server; the rest queue and run as slots free up.

The throttlers

Each throttler guards one type of work. A segment operation may pass through several of them in sequence (for example, an OFFLINE→ONLINE transition can download, then preprocess indexes).

Throttler	What it protects	When it applies
Download	Concurrent segment downloads from the deep store	Any operation that needs to fetch a segment locally (restart, reset, rebalance, backfill)
Index preprocess (all)	Concurrent index building / preprocessing	Building or rebuilding indexes during segment load and reload
StarTree preprocess	StarTree index rebuilds (CPU- and memory-heavy)	Acquired while holding the index-preprocess slot, for segments with a StarTree index
Multi-column text index preprocess	Multi-column text index builds (heavy)	Acquired while holding the index-preprocess slot, for segments with a multi-column text index
RocksDB	Upsert / dedup metadata operations backed by RocksDB	Loading or preloading segments of an upsert or dedup table

StarTree and multi-column text index preprocessing are acquired on top of the index-preprocess throttler, so they are intentionally given small limits (default 1) — they are the most expensive per segment.

Context-aware isolation (consuming vs. general)

By default, throttlers are split into two isolated sets so that operations on consuming (realtime) segments are never blocked behind bulk background work:

consuming — operations that involve consuming segments, such as committing a realtime segment. These keep ingestion healthy.
general — everything else: server restart, table reset, rebalance, reload, refresh, and upsert/dedup preload.

Because the two sets hold separate permits, a large general operation (a 10k-segment reset) cannot exhaust the permits that consuming-segment commits rely on. This isolation is controlled at the server level:

Cluster config	Default	Description
`pinot.server.throttler.context.aware.enabled`	`true`	Enable the `consuming` / `general` split. When `false`, all operations share one set.
`pinot.server.throttler.parallelism.ratio.consuming`	`0.75`	Fraction of the configured parallelism allotted to the `consuming` set.
`pinot.server.throttler.parallelism.ratio.general`	`0.75`	Fraction of the configured parallelism allotted to the `general` set.

Server-level limits

Server-level limits are set as cluster configs and apply to every server in the cluster. Updating a cluster config takes effect immediately, without a restart.

Throttler	Cluster config	Default*
Download	`pinot.server.max.segment.download.parallelism`	`max(1, cores/4)`
Index preprocess (all)	`pinot.server.max.segment.preprocess.parallelism`	`max(1, cores/4)`
StarTree preprocess	`pinot.server.max.segment.startree.preprocess.parallelism`	`1`
Multi-col text index preprocess	`pinot.server.max.segment.multicol.text.index.preprocess.parallelism`	`1`
RocksDB (upsert/dedup)	`pinot.server.max.segment.rocksdb.parallelism`	`max(1, cores/4)`

_{*Defaults scale with the number of CPU cores on the server.} Each limit also has a .before.serving.queries variant (for example, pinot.server.max.segment.download.parallelism.before.serving.queries). This variant is used only during server startup, before the server begins serving queries, and is set higher (up to all cores) so a starting server loads its segments quickly. Once the server starts serving queries, the steady-state limit above takes over to protect query latency.

Setting a limit to 0 acts as a kill switch: it halts that operation entirely and parks any caller until permits are restored (by raising the limit again). This is useful to immediately stop a runaway operation, but remember to restore it afterward — leaving it at 0 will stall segment loading.

Per-table tuning

To stop one table from monopolizing a server-level throttler, you can cap how many permits a single table may hold within a throttler. This is set in the table config, under customConfigs, using keys of the form:

segmentOperation.concurrencyLevel.<context>.<throttlerType>

<context> — general or consuming
<throttlerType> — download, allIndexPreprocess, starTreePreprocess, multiColTextIndexPreprocess, or rocksDB

The value is the maximum number of permits this table may hold in that throttler at once. A table with a low cap yields the remaining permits to other tables, so heavy work on it has a smaller blast radius.

{
  "tableName": "myTable_OFFLINE",
  "tableType": "OFFLINE",
  "...": "...",
  "metadata": {
      "customConfigs": {
        "segmentOperation.concurrencyLevel.general.allIndexPreprocess": "5",
        "segmentOperation.concurrencyLevel.consuming.download": "1",
      }
  }
}

In the example above, while resetting or rebalancing myTable, the table can use at most 2 of the server’s index-preprocess permits and 2 download permits at a time, leaving headroom for other tables. Table config changes are picked up live — no restart required.

Use per-table caps for the few tables that are large or operationally heavy. Leave everything else unset so those tables draw from the full server-level limit. An unset (or invalid) key means “no per-table cap” for that throttler.

Observability

Each throttler exposes server metrics so you can see saturation and tune limits with evidence:

Threshold — the currently configured permit count for the throttler.
In-use count — how many permits are held right now. Sustained in-use ≈ threshold means the throttler is saturated and is the bottleneck.
Queue length — how many pending operations are waiting for this throttler from each table.
Wait time / hold time / acquisition — how long operations wait to acquire a permit, how long they hold the throttler, and how often they acquire per table. High wait time confirms contention.

These metrics are available per throttler type (download, index preprocess, StarTree, multi-col text, RocksDB) and are surfaced on the segment-operations panels of the StarTree Grafana dashboards.

How to tune

Find the saturated throttler

On the Grafana segment-operations panels, look for a throttler whose in-use count sits at its threshold with rising wait times during the operation you are running (restart, reset, rebalance, reload, ingestion spike).

Decide where to apply the limit

If one table is responsible (for example, a large reset), add a per-table cap so other tables keep making progress. If the whole server is overloaded, lower the server-level limit for that throttler instead.

Adjust and observe

Change the cluster config or table config and watch the metrics — limits apply live. Lower a limit to reduce impact on query latency; raise it to drain a backlog faster.

For emergencies

Set the offending throttler’s server-level limit to 0 to immediately halt that operation, then restore it to a sensible value once the situation is stable.

Get Started

Ingestion

External Tables

Query Data

Manage Data

Observability

Visualize Data

Cluster Operations

Manage Security

Release Notes

Reference

Segment Operation Throttling

The throttlers

Context-aware isolation (consuming vs. general)

Server-level limits

Per-table tuning

Observability

How to tune

​The throttlers

​Context-aware isolation (consuming vs. general)

​Server-level limits

​Per-table tuning

​Observability

​How to tune

The throttlers

Context-aware isolation (consuming vs. general)

Server-level limits

Per-table tuning

Observability

How to tune