Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.startree.ai/llms.txt

Use this file to discover all available pages before exploring further.

Context

Workload isolation is a new feature in StarTree Cloud (Starting 0.14 release) that allows users to physically isolate queries from different workloads on the same Pinot table. For instance, let’s assume we have following 2 workloads
  1. User facing (application) queries: characterized by strict SLA, high impact to business
  2. Adhoc queries: Required for triaging/debugging, low impact to business
With this feature, we can isolate A and B using replica group based routing isolation.

Pre-requisite

For this feature to work, the table must be configured to use replica groups and use the ‘StarTreeReplicaGroupInstanceSelector’ instance selector type. For instance let’s consider this sample table config
{
  "instanceAssignmentConfigMap": {
    "OFFLINE": {
      "tagPoolConfig": {
        "tag": "Tag1_OFFLINE"
      },
      "replicaGroupPartitionConfig": {
        "replicaGroupBased": true,
        "numReplicaGroups": 3,
        "numInstancesPerReplicaGroup": 3
      }
    }
  },
  "routing": {
    "instanceSelectorType": "ai.startree.pinot.broker.routing.instanceselector.StarTreeReplicaGroupInstanceSelector"
  }
}
In this case, we’ve configured to use replica group based instance assignment & routing with a total of 3 replica groups (each with 3 instances). The layout will look like this Note: If the current setup does not include replica groups, it must be enabled before using this feature. This involves changing the table config for all relevant tables and doing a rebalance operation.

How to use this feature

Preferred Replicas

Once enabled, you can configure isolation at the query level via the preferredReplicas query option. It accepts a comma-separated list of replica group IDs (derived from ideal state, not from instance partitions) that the broker should use to route the query. Suppose we want the following configuration:
  • Workload A: allowed to use both replica groups (0 and 1)
  • Workload B: restricted to replica group 2
Queries from Workload A can be written like this
SET preferredReplicas='0,1';
select ... from table where ...
Whereas, queries from Workload B can we written like this
SET preferredReplicas='2';
select ... from table where ...

Preferred Replica list semantics

When multiple replica groups are listed, the broker rotates across them by request ID for load balancing — it does not treat the first entry as primary and the rest as failover. So preferredReplicas=‘0,1’ routes roughly half of the requests entirely through RG 0 and the other half entirely through RG 1. To express “prefer RG 0, and fall over to RG 1 only when RG 0 cannot serve a segment”, use preferredReplicas=‘0’ together with fallbackReplicas=‘1’ (see the next section). Single-replica case. preferredReplicas=‘2’ means the query is routed only to instances in replica group 2. If any segment is unavailable on RG 2, it is reported as unavailable (partial result) unless fallbackReplicas is set. Therefore, in the happy path, Workload A will be served from RG 0 and RG 1 (load balanced), and Workload B will be served exclusively from RG 2.

Fallback replicas

The fallbackReplicas query option is an optional companion to preferredReplicas. It accepts a comma-separated list of replica group IDs that the broker should consider only when none of the preferredReplicas can serve a given segment. Example:
SET preferredReplicas = '0';
SET fallbackReplicas  = '1,2';
SELECT ... FROM table WHERE ...
In the example above, the broker tries to serve every segment from RG 0; for any segment that RG 0 cannot serve, it tries RG 1, then RG 2, before declaring the segment unavailable.

Selection order

For each segment, the broker walks the preferred replicas first (rotated by request ID for load balancing within the preferred set), and only falls through to the fallbackReplicas (in the listed order, no rotation) when no preferred replica has the segment online. Cross-preferred-replica failover takes priority over fallback: with preferredReplicas=‘0,1’ and fallbackReplicas=‘2’, a segment that is offline on RG 0 but online on RG 1 is served from RG 1, and the fallback metric is not incremented.

Upsert / dedup tables

Falling back to a different replica group is safe even for upsert / dedup tables: if any segment in a partition is unavailable on an instance, the underlying selector removes that instance as a candidate for all segments in that partition, so the fallback replica is chosen consistently across the whole partition.

Observability

Whenever any segment in a query is served by a fallback replica, the broker increments the per-table meter pinot.broker.<table>.queriesUsingFallbackReplica (exposed to Prometheus as pinot_broker_queriesUsingFallbackReplica_* with a “table” label). Use this metric to alert on workload isolation being silently violated. Recommended pattern. For strict isolation, set preferredReplicas only. For best-effort isolation with availability under partial outages, set preferredReplicas to your isolation target and fallbackReplicas to the rest — and alert on the queriesUsingFallbackReplica meter so you know when isolation is being broken. Here are the key failover scenarios:
Failure ScenarioImplication (no fallbackReplicas)Implication (with fallbackReplicas set)
Segment replica failure within a preferred RG, but another preferred RG has the segmentServed from the other preferred RG (no metric increment)Same as left column — fallback is not used
Segment replica failure across all preferred RGsSegment reported unavailable → partial resultServed from the first fallback RG that has the segment; queriesUsingFallbackReplica meter is incremented
Segment replica failure on a preferred RG with upserts enabledEntire affected partition is failed over to the next preferred RGEntire partition fails over to the next available RG (preferred or fallback) that has it (consistent across the partition)
Segment unavailable on every preferred and fallback RGPartial resultPartial result

Limitations

  • During any kind of segment movement (eg: rebalance or segment relocator -> realtime to offline or across tiers) - workload isolation guarantees are lost (this is something to do with the internals of Apache Pinot as it works today).
  • Only homogeneous replica groups are supported today. In other words, each replica group must contain the same amount of instances.
  • When preferredReplicas lists more than one replica group, request-ID-based rotation is the only load-balancing strategy — adaptive server selection is not consulted within the preferred set.