Skip to main content
When a server loads segments — during a restart, a reset, a rebalance, a reload/refresh, or while committing realtime segments — it competes for a few shared, resource-intensive operations: downloading segments from the deep store, building indexes, and (for upsert/dedup tables) updating RocksDB metadata. Left unbounded, a single heavy operation (for example, resetting a table with tens of thousands of segments) can saturate these resources and slow down query serving or block other tables. Segment operation throttling puts a configurable concurrency limit on each of these operations. Limits are enforced per server and can be tuned live — you do not need to restart the server to change them.
Throttling limits concurrency (how many operations run at once), not throughput. A limit of 4 means at most 4 segments are processed by that operation simultaneously on a given server; the rest queue and run as slots free up.

The throttlers

Each throttler guards one type of work. A segment operation may pass through several of them in sequence (for example, an OFFLINE→ONLINE transition can download, then preprocess indexes).
ThrottlerWhat it protectsWhen it applies
DownloadConcurrent segment downloads from the deep storeAny operation that needs to fetch a segment locally (restart, reset, rebalance, backfill)
Index preprocess (all)Concurrent index building / preprocessingBuilding or rebuilding indexes during segment load and reload
StarTree preprocessStarTree index rebuilds (CPU- and memory-heavy)Acquired while holding the index-preprocess slot, for segments with a StarTree index
Multi-column text index preprocessMulti-column text index builds (heavy)Acquired while holding the index-preprocess slot, for segments with a multi-column text index
RocksDBUpsert / dedup metadata operations backed by RocksDBLoading or preloading segments of an upsert or dedup table
StarTree and multi-column text index preprocessing are acquired on top of the index-preprocess throttler, so they are intentionally given small limits (default 1) — they are the most expensive per segment.

Context-aware isolation (consuming vs. general)

By default, throttlers are split into two isolated sets so that operations on consuming (realtime) segments are never blocked behind bulk background work:
  • consuming — operations that involve consuming segments, such as committing a realtime segment. These keep ingestion healthy.
  • general — everything else: server restart, table reset, rebalance, reload, refresh, and upsert/dedup preload.
Because the two sets hold separate permits, a large general operation (a 10k-segment reset) cannot exhaust the permits that consuming-segment commits rely on. This isolation is controlled at the server level:
Cluster configDefaultDescription
pinot.server.throttler.context.aware.enabledtrueEnable the consuming / general split. When false, all operations share one set.
pinot.server.throttler.parallelism.ratio.consuming0.75Fraction of the configured parallelism allotted to the consuming set.
pinot.server.throttler.parallelism.ratio.general0.75Fraction of the configured parallelism allotted to the general set.

Server-level limits

Server-level limits are set as cluster configs and apply to every server in the cluster. Updating a cluster config takes effect immediately, without a restart.
ThrottlerCluster configDefault*
Downloadpinot.server.max.segment.download.parallelismmax(1, cores/4)
Index preprocess (all)pinot.server.max.segment.preprocess.parallelismmax(1, cores/4)
StarTree preprocesspinot.server.max.segment.startree.preprocess.parallelism1
Multi-col text index preprocesspinot.server.max.segment.multicol.text.index.preprocess.parallelism1
RocksDB (upsert/dedup)pinot.server.max.segment.rocksdb.parallelismmax(1, cores/4)
*Defaults scale with the number of CPU cores on the server. Each limit also has a .before.serving.queries variant (for example, pinot.server.max.segment.download.parallelism.before.serving.queries). This variant is used only during server startup, before the server begins serving queries, and is set higher (up to all cores) so a starting server loads its segments quickly. Once the server starts serving queries, the steady-state limit above takes over to protect query latency.
Setting a limit to 0 acts as a kill switch: it halts that operation entirely and parks any caller until permits are restored (by raising the limit again). This is useful to immediately stop a runaway operation, but remember to restore it afterward — leaving it at 0 will stall segment loading.

Per-table tuning

To stop one table from monopolizing a server-level throttler, you can cap how many permits a single table may hold within a throttler. This is set in the table config, under customConfigs, using keys of the form:
segmentOperation.concurrencyLevel.<context>.<throttlerType>
  • <context>general or consuming
  • <throttlerType>download, allIndexPreprocess, starTreePreprocess, multiColTextIndexPreprocess, or rocksDB
The value is the maximum number of permits this table may hold in that throttler at once. A table with a low cap yields the remaining permits to other tables, so heavy work on it has a smaller blast radius.
{
  "tableName": "myTable_OFFLINE",
  "tableType": "OFFLINE",
  "...": "...",
  "customConfig": {
    "customConfigs": {
      "segmentOperation.concurrencyLevel.general.allIndexPreprocess": "2",
      "segmentOperation.concurrencyLevel.general.download": "2"
    }
  }
}
In the example above, while resetting or rebalancing myTable, the table can use at most 2 of the server’s index-preprocess permits and 2 download permits at a time, leaving headroom for other tables. Table config changes are picked up live — no restart required.
Use per-table caps for the few tables that are large or operationally heavy. Leave everything else unset so those tables draw from the full server-level limit. An unset (or invalid) key means “no per-table cap” for that throttler.

Observability

Each throttler exposes server metrics so you can see saturation and tune limits with evidence:
  • Threshold — the currently configured permit count for the throttler.
  • In-use count — how many permits are held right now. Sustained in-use ≈ threshold means the throttler is saturated and is the bottleneck.
  • Queue length — how many pending operations are waiting for this throttler from each table.
  • Wait time / hold time / acquisition — how long operations wait to acquire a permit, how long they hold the throttler, and how often they acquire per table. High wait time confirms contention.
These metrics are available per throttler type (download, index preprocess, StarTree, multi-col text, RocksDB) and are surfaced on the segment-operations panels of the StarTree Grafana dashboards.

How to tune

1

Find the saturated throttler

On the Grafana segment-operations panels, look for a throttler whose in-use count sits at its threshold with rising wait times during the operation you are running (restart, reset, rebalance, reload, ingestion spike).
2

Decide where to apply the limit

If one table is responsible (for example, a large reset), add a per-table cap so other tables keep making progress. If the whole server is overloaded, lower the server-level limit for that throttler instead.
3

Adjust and observe

Change the cluster config or table config and watch the metrics — limits apply live. Lower a limit to reduce impact on query latency; raise it to drain a backlog faster.
4

For emergencies

Set the offending throttler’s server-level limit to 0 to immediately halt that operation, then restore it to a sensible value once the situation is stable.