This feature requires StarTree release 0.15.0 or later.
Scheduled Server Scaling lets you automatically scale Apache Pinot server replica groups up or down on a fixed schedule. This is useful when your query load is predictable — for example, running fewer server replica groups overnight or on weekends to save cost, and restoring full capacity before peak hours.
Scaling is driven by cron schedules defined per Pinot tenant. At each scheduled time, the operator asks the Pinot controller which servers to remove (or restore) to reach your target replica-group count, drains queries off the affected servers, and scales their StatefulSets accordingly.
Scaling operates at the granularity of replica groups, not individual servers. You specify a target number of replica groups; the Pinot controller computes the exact set of servers to remove or restore to achieve that target. See Replica Group based Workload Isolation for background on replica groups and pools.
Scaling down reduces redundancy for the tenant until the matching scale-up restores it. A schedule that targets targetReplicaGroups: 1 leaves the tenant with no replica group failover for the duration of the scale-down window — if that single replica group has an issue, queries to the tenant’s tables will fail. Choose a target that keeps the minimum redundancy your workload needs.
Prerequisites
- Your cluster is on StarTree release 0.15.0 or later.
- Your cluster is managed by the StarTree Kubernetes operator via the
PinotCluster custom resource (this feature is configured at the operator level, not from the Data Portal UI).
- The tenant’s tables already use pool-based, replica-group-aware instance assignment — see Controller requirements for scale-down for the exact conditions the controller checks before it will scale down a tenant.
How it works
- You enable scheduled scaling in the
PinotCluster spec under the server component and define one or more schedules per tenant.
- The operator creates and maintains a
ScheduledServerScaling custom resource for each tenant you configure.
- At each schedule’s cron time:
- Scale down — the operator calls the Pinot controller to determine which servers can be removed to reach the target replica-group count, drains in-flight queries off them (bounded by
queryDrainTimeout), then scales those server StatefulSets to zero.
- Scale up — the operator determines which previously-removed servers must come back to reach the target, and restores their StatefulSets.
- If the operator was down when a schedule fired, it still executes the missed run as long as it restarts within the
missedExecutionWindow. After that window passes, the missed run is skipped until the next occurrence.
While a scale operation is in progress, the operator suppresses normal replica reconciliation for the affected servers, so a manually-running cluster reconcile will not fight the schedule.
Enabling it
Add a scheduledScaling block to the server component in your PinotCluster resource:
apiVersion: startreedata.io/v2alpha1
kind: PinotCluster
metadata:
name: my-pinot
namespace: pinot
spec:
components:
server:
scheduledScaling:
enabled: true
tenants:
- tenant: DefaultTenant
schedules:
- name: nightly-scale-down
action: SCHEDULED_SCALE_DOWN
cron: "0 22 * * *" # Every day at 22:00 UTC
targetReplicaGroups: 1
queryDrainTimeout: "10m"
missedExecutionWindow: "30m"
- name: morning-scale-up
action: SCHEDULED_SCALE_UP
cron: "0 6 * * *" # Every day at 06:00 UTC
targetReplicaGroups: 3
queryDrainTimeout: "10m"
missedExecutionWindow: "30m"
To disable scheduled scaling, set enabled: false (or remove the scheduledScaling block). When a tenant is removed from the spec, the operator deletes its ScheduledServerScaling resource and restores any servers that were left scaled down.
Configuration reference
scheduledScaling (under spec.components.server):
| Field | Type | Description |
|---|
enabled | boolean | Master switch for scheduled scaling on this cluster. |
tenants | array | One entry per Pinot tenant you want to schedule. Each has tenant + schedules. |
Each entry in tenants:
| Field | Type | Description |
|---|
tenant | string | Pinot server tenant name (e.g. DefaultTenant). |
schedules | array | One or more scheduled actions for this tenant. |
Each entry in schedules:
| Field | Type | Required | Description |
|---|
name | string | yes | A label for the schedule, used in logs and status. |
action | enum | yes | SCHEDULED_SCALE_DOWN or SCHEDULED_SCALE_UP. |
cron | string | yes | 5-field Unix cron expression (minute hour day-of-month month day-of-week), evaluated in UTC. See examples below. |
targetReplicaGroups | integer | yes | Desired number of server replica groups for the tenant after this action completes. Must be greater than 0. |
queryDrainTimeout | string | no | Max time to wait for in-flight queries to drain off a server before scaling it down (e.g. 10m, 30s). |
missedExecutionWindow | string | no | Grace period after the scheduled time during which a missed run still executes if the operator was down (e.g. 30m, 1h). Defaults to 15m. |
Schedules use standard 5-field Unix cron (minute hour day-of-month month day-of-week). All times are evaluated in UTC — convert your local schedule to UTC before setting cron.
| Expression | Meaning (UTC) |
|---|
0 22 * * * | Every day at 22:00 UTC |
0 6 * * 1-5 | Weekdays at 06:00 UTC |
0 0 * * 0 | Every Sunday at midnight UTC |
30 18 * * 5 | Every Friday at 18:30 UTC |
Typical pattern: nightly scale-down, morning scale-up
Pair a SCHEDULED_SCALE_DOWN with a SCHEDULED_SCALE_UP to shrink capacity during off-hours and restore it before peak load:
- Scale down at night to a low
targetReplicaGroups (e.g. 1).
- Scale up in the morning back to your full
targetReplicaGroups (e.g. 3).
Use the same tenant for both schedules. The scale-up restores exactly the servers that the matching scale-down removed.
Checking status
The operator tracks each tenant’s scaling state in the ScheduledServerScaling resource:
kubectl get scheduledserverscaling -n <namespace>
kubectl get scheduledserverscaling <name> -n <namespace> -o yaml
Key status fields:
| Field | Description |
|---|
status.state | AVAILABLE, DELETING (scale-down in progress), DELETED (scaled down), RESTORING (scale-up in progress), RESTORED. |
status.lastScalingOperation.action | The most recent action: SCHEDULED_SCALE_DOWN or SCHEDULED_SCALE_UP. |
status.lastScalingOperation.lastOccurrence | Timestamp of the cron fire time that triggered the operation. |
status.lastScalingOperation.serversAffected | List of server StatefulSets affected by the operation. |
status.lastScalingOperation.targetReplicaGroups | The target replica-group count of the last completed operation. |
Notes and limitations
targetReplicaGroups must be greater than 0 — a schedule cannot remove every replica group.
- If
targetReplicaGroups equals the tenant’s current total replica groups, a scale-down is a no-op (nothing to remove).
- Cron times are evaluated in UTC — convert your local time before setting
cron.
- Scheduled scaling only affects server StatefulSets for the named tenant. It does not change Zookeeper, Controller, Broker, or Minion components.
Controller requirements for scale-down
At each scale-down, the operator asks the Pinot controller (GET /serverReplicaGroupScaleDown) which servers can be removed for the tenant. The controller only returns a server list when all of the following hold. If any fails, the scale-down is rejected and no servers are removed — fix the underlying condition and the next scheduled run (or a manual retry) will proceed.
Tenant and topology
- The tenant exists and has servers tagged for it (
<tenant>_OFFLINE / <tenant>_REALTIME).
- Servers are exclusive to the tenant — no server tagged for this tenant may also carry tags for another tenant.
- Every tenant server has a pool assignment, and each pool matches at least one of the tenant’s tags.
- If a server has both OFFLINE and REALTIME tags, both must point to the same pool number.
Target value
targetReplicaGroups must be ≥ 1 and ≤ the current number of replica groups (pools).
- If
targetReplicaGroups equals the current replica-group count, the call succeeds but returns no servers (nothing to remove).
Table configuration (when the tenant has tables)
- No table may use
instancePartitionsMap (which bypasses pool-based assignment).
- All non-dimension tables must use pool-based, replica-group-aware instance assignment.
- Each table’s configured
numReplicaGroups (when non-zero) must equal the current pool count.
No rebalance in progress
- No table in the tenant may have an active or failed table rebalance job.
- The tenant may not have an active, aborted, cancelled, or unscheduled tenant rebalance job.
The controller evaluates these against a point-in-time snapshot of cluster state. The highest numbered pools are always selected for removal, so a given (tenant, targetReplicaGroups) request is deterministic.
Troubleshooting
A scale-down is rejected, with no servers removed, whenever one of the controller requirements above isn’t met. Fix the underlying condition — the next scheduled run (or a manual retry) will proceed once it’s resolved.
| Symptom | Likely cause | What to do |
|---|
| Scale-down never removes any servers | A server tagged for the tenant is also tagged for another tenant | Make servers exclusive to one tenant, or scope the schedule to a tenant that already has dedicated servers. |
| Scale-down never removes any servers | A server is missing a pool assignment, or OFFLINE/REALTIME tags point to different pools | Fix the server’s pool assignment so it matches the tenant’s tags, and align OFFLINE/REALTIME pool numbers. |
| Scale-down never removes any servers | targetReplicaGroups is greater than the tenant’s current replica-group (pool) count | Lower targetReplicaGroups to at most the current pool count. |
| Scale-down never removes any servers | A table uses instancePartitionsMap | Migrate the table to pool-based instance assignment before scheduling scale-down for its tenant. |
| Scale-down never removes any servers | A table’s numReplicaGroups doesn’t match the current pool count | Update the table’s replica-group config to match the tenant’s current pool count. |
| Scale-down never removes any servers | A table or tenant rebalance is active, aborted, cancelled, or unscheduled | Let the rebalance finish (or clear the stuck job), then wait for the next scheduled run or retry manually. |
| Scale-up doesn’t fully restore capacity | The matching scale-down removed a different set of servers than expected | Check status.lastScalingOperation.serversAffected on the ScheduledServerScaling resource for both operations to confirm which servers were removed vs. restored. |
| A schedule didn’t run at all | The operator was down past missedExecutionWindow at the cron time | Increase missedExecutionWindow if operator restarts routinely take longer than the current setting, or trigger the action manually. |