> ## Documentation Index
> Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Scheduled Server Scaling

<Warning>
  This feature requires StarTree release 0.15.0 or later.
</Warning>

Scheduled Server Scaling lets you automatically scale Apache Pinot server replica groups up or down on a fixed schedule. This is useful when your query load is predictable — for example, running fewer server replica groups overnight or on weekends to save cost, and restoring full capacity before peak hours.

Scaling is driven by **cron schedules** defined per Pinot **tenant**. At each scheduled time, the operator asks the Pinot controller which servers to remove (or restore) to reach your target replica-group count, drains queries off the affected servers, and scales their StatefulSets accordingly.

<Note>
  Scaling operates at the granularity of **replica groups**, not individual servers. You specify a target number of replica groups; the Pinot controller computes the exact set of servers to remove or restore to achieve that target. See [Replica Group based Workload Isolation](/corecapabilities/query_data/advanced_operations/replica-group-based-workload-isolation) for background on replica groups and pools.
</Note>

<Warning>
  Scaling down reduces redundancy for the tenant until the matching scale-up restores it. A schedule that targets `targetReplicaGroups: 1` leaves the tenant with **no replica group failover** for the duration of the scale-down window — if that single replica group has an issue, queries to the tenant's tables will fail. Choose a target that keeps the minimum redundancy your workload needs.
</Warning>

## Prerequisites

* Your cluster is on **StarTree release 0.15.0 or later**.
* Your cluster is managed by the StarTree Kubernetes operator via the `PinotCluster` custom resource (this feature is configured at the operator level, not from the Data Portal UI).
* The tenant's tables already use **pool-based, replica-group-aware** instance assignment — see [Controller requirements for scale-down](#controller-requirements-for-scale-down) for the exact conditions the controller checks before it will scale down a tenant.

## How it works

```mermaid theme={null}
flowchart TD
    Cron["Cron time fires\nfor a schedule"] --> Action{"action"}
    Action -->|SCHEDULED_SCALE_DOWN| AskDown["Ask Pinot controller which\nservers to remove\n(GET /serverReplicaGroupScaleDown)"]
    AskDown --> Drain["Drain in-flight queries\n(bounded by queryDrainTimeout)"]
    Drain --> ScaleDownSS["Scale affected server\nStatefulSets to zero"]
    Action -->|SCHEDULED_SCALE_UP| AskUp["Determine which previously-removed\nservers must come back"]
    AskUp --> ScaleUpSS["Restore those server\nStatefulSets"]
    Missed["Operator was down\nat cron time"] -.->|restarts within\nmissedExecutionWindow| Action
    Missed -.->|restarts after\nmissedExecutionWindow| Skip["Run is skipped"]
```

1. You enable scheduled scaling in the `PinotCluster` spec under the `server` component and define one or more schedules per tenant.
2. The operator creates and maintains a `ScheduledServerScaling` custom resource for each tenant you configure.
3. At each schedule's cron time:
   * **Scale down** — the operator calls the Pinot controller to determine which servers can be removed to reach the target replica-group count, drains in-flight queries off them (bounded by `queryDrainTimeout`), then scales those server StatefulSets to zero.
   * **Scale up** — the operator determines which previously-removed servers must come back to reach the target, and restores their StatefulSets.
4. If the operator was down when a schedule fired, it still executes the missed run as long as it restarts within the `missedExecutionWindow`. After that window passes, the missed run is skipped until the next occurrence.

While a scale operation is in progress, the operator suppresses normal replica reconciliation for the affected servers, so a manually-running cluster reconcile will not fight the schedule.

## Enabling it

Add a `scheduledScaling` block to the **server** component in your `PinotCluster` resource:

```yaml theme={null}
apiVersion: startreedata.io/v2alpha1
kind: PinotCluster
metadata:
  name: my-pinot
  namespace: pinot
spec:
  components:
    server:
      scheduledScaling:
        enabled: true
        tenants:
          - tenant: DefaultTenant
            schedules:
              - name: nightly-scale-down
                action: SCHEDULED_SCALE_DOWN
                cron: "0 22 * * *"          # Every day at 22:00 UTC
                targetReplicaGroups: 1
                queryDrainTimeout: "10m"
                missedExecutionWindow: "30m"
              - name: morning-scale-up
                action: SCHEDULED_SCALE_UP
                cron: "0 6 * * *"           # Every day at 06:00 UTC
                targetReplicaGroups: 3
                queryDrainTimeout: "10m"
                missedExecutionWindow: "30m"
```

To disable scheduled scaling, set `enabled: false` (or remove the `scheduledScaling` block). When a tenant is removed from the spec, the operator deletes its `ScheduledServerScaling` resource and restores any servers that were left scaled down.

## Configuration reference

`scheduledScaling` (under `spec.components.server`):

| Field     | Type    | Description                                                                       |
| :-------- | :------ | :-------------------------------------------------------------------------------- |
| `enabled` | boolean | Master switch for scheduled scaling on this cluster.                              |
| `tenants` | array   | One entry per Pinot tenant you want to schedule. Each has `tenant` + `schedules`. |

Each entry in `tenants`:

| Field       | Type   | Description                                      |
| :---------- | :----- | :----------------------------------------------- |
| `tenant`    | string | Pinot server tenant name (e.g. `DefaultTenant`). |
| `schedules` | array  | One or more scheduled actions for this tenant.   |

Each entry in `schedules`:

| Field                   | Type    | Required | Description                                                                                                                                    |
| :---------------------- | :------ | :------- | :--------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`                  | string  | yes      | A label for the schedule, used in logs and status.                                                                                             |
| `action`                | enum    | yes      | `SCHEDULED_SCALE_DOWN` or `SCHEDULED_SCALE_UP`.                                                                                                |
| `cron`                  | string  | yes      | 5-field Unix cron expression (`minute hour day-of-month month day-of-week`), evaluated in **UTC**. See examples below.                         |
| `targetReplicaGroups`   | integer | yes      | Desired number of server replica groups for the tenant **after** this action completes. Must be greater than 0.                                |
| `queryDrainTimeout`     | string  | no       | Max time to wait for in-flight queries to drain off a server before scaling it down (e.g. `10m`, `30s`).                                       |
| `missedExecutionWindow` | string  | no       | Grace period after the scheduled time during which a missed run still executes if the operator was down (e.g. `30m`, `1h`). Defaults to `15m`. |

### Cron format

Schedules use standard **5-field Unix cron** (`minute hour day-of-month month day-of-week`). All times are evaluated in **UTC** — convert your local schedule to UTC before setting `cron`.

| Expression    | Meaning (UTC)                |
| :------------ | :--------------------------- |
| `0 22 * * *`  | Every day at 22:00 UTC       |
| `0 6 * * 1-5` | Weekdays at 06:00 UTC        |
| `0 0 * * 0`   | Every Sunday at midnight UTC |
| `30 18 * * 5` | Every Friday at 18:30 UTC    |

## Typical pattern: nightly scale-down, morning scale-up

Pair a `SCHEDULED_SCALE_DOWN` with a `SCHEDULED_SCALE_UP` to shrink capacity during off-hours and restore it before peak load:

* **Scale down** at night to a low `targetReplicaGroups` (e.g. `1`).
* **Scale up** in the morning back to your full `targetReplicaGroups` (e.g. `3`).

Use the same tenant for both schedules. The scale-up restores exactly the servers that the matching scale-down removed.

## Checking status

The operator tracks each tenant's scaling state in the `ScheduledServerScaling` resource:

```bash theme={null}
kubectl get scheduledserverscaling -n <namespace>
kubectl get scheduledserverscaling <name> -n <namespace> -o yaml
```

Key status fields:

| Field                                             | Description                                                                                                                |
| :------------------------------------------------ | :------------------------------------------------------------------------------------------------------------------------- |
| `status.state`                                    | `AVAILABLE`, `DELETING` (scale-down in progress), `DELETED` (scaled down), `RESTORING` (scale-up in progress), `RESTORED`. |
| `status.lastScalingOperation.action`              | The most recent action: `SCHEDULED_SCALE_DOWN` or `SCHEDULED_SCALE_UP`.                                                    |
| `status.lastScalingOperation.lastOccurrence`      | Timestamp of the cron fire time that triggered the operation.                                                              |
| `status.lastScalingOperation.serversAffected`     | List of server StatefulSets affected by the operation.                                                                     |
| `status.lastScalingOperation.targetReplicaGroups` | The target replica-group count of the last completed operation.                                                            |

## Notes and limitations

* `targetReplicaGroups` must be **greater than 0** — a schedule cannot remove every replica group.
* If `targetReplicaGroups` equals the tenant's current total replica groups, a scale-down is a no-op (nothing to remove).
* Cron times are evaluated in **UTC** — convert your local time before setting `cron`.
* Scheduled scaling only affects **server** StatefulSets for the named tenant. It does not change Zookeeper, Controller, Broker, or Minion components.

## Controller requirements for scale-down

At each scale-down, the operator asks the Pinot controller (`GET /serverReplicaGroupScaleDown`) which servers can be removed for the tenant. The controller only returns a server list when **all** of the following hold. If any fails, the scale-down is rejected and no servers are removed — fix the underlying condition and the next scheduled run (or a manual retry) will proceed.

**Tenant and topology**

* The tenant exists and has servers tagged for it (`<tenant>_OFFLINE` / `<tenant>_REALTIME`).
* **Servers are exclusive to the tenant** — no server tagged for this tenant may also carry tags for another tenant.
* Every tenant server has a **pool assignment**, and each pool matches at least one of the tenant's tags.
* If a server has both OFFLINE and REALTIME tags, both must point to the **same pool number**.

**Target value**

* `targetReplicaGroups` must be **≥ 1** and **≤ the current number of replica groups (pools)**.
* If `targetReplicaGroups` equals the current replica-group count, the call succeeds but returns **no servers** (nothing to remove).

**Table configuration (when the tenant has tables)**

* No table may use `instancePartitionsMap` (which bypasses pool-based assignment).
* All non-dimension tables must use **pool-based, replica-group-aware** instance assignment.
* Each table's configured `numReplicaGroups` (when non-zero) must **equal the current pool count**.

**No rebalance in progress**

* No table in the tenant may have an active or failed **table rebalance** job.
* The tenant may not have an active, aborted, cancelled, or unscheduled **tenant rebalance** job.

<Note>
  The controller evaluates these against a point-in-time snapshot of cluster state. The highest numbered pools are always selected for removal, so a given `(tenant, targetReplicaGroups)` request is deterministic.
</Note>

## Troubleshooting

A scale-down is rejected, with no servers removed, whenever one of the [controller requirements](#controller-requirements-for-scale-down) above isn't met. Fix the underlying condition — the next scheduled run (or a manual retry) will proceed once it's resolved.

| Symptom                                 | Likely cause                                                                             | What to do                                                                                                                                                           |
| :-------------------------------------- | :--------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Scale-down never removes any servers    | A server tagged for the tenant is also tagged for another tenant                         | Make servers exclusive to one tenant, or scope the schedule to a tenant that already has dedicated servers.                                                          |
| Scale-down never removes any servers    | A server is missing a pool assignment, or OFFLINE/REALTIME tags point to different pools | Fix the server's pool assignment so it matches the tenant's tags, and align OFFLINE/REALTIME pool numbers.                                                           |
| Scale-down never removes any servers    | `targetReplicaGroups` is greater than the tenant's current replica-group (pool) count    | Lower `targetReplicaGroups` to at most the current pool count.                                                                                                       |
| Scale-down never removes any servers    | A table uses `instancePartitionsMap`                                                     | Migrate the table to pool-based instance assignment before scheduling scale-down for its tenant.                                                                     |
| Scale-down never removes any servers    | A table's `numReplicaGroups` doesn't match the current pool count                        | Update the table's replica-group config to match the tenant's current pool count.                                                                                    |
| Scale-down never removes any servers    | A table or tenant rebalance is active, aborted, cancelled, or unscheduled                | Let the rebalance finish (or clear the stuck job), then wait for the next scheduled run or retry manually.                                                           |
| Scale-up doesn't fully restore capacity | The matching scale-down removed a different set of servers than expected                 | Check `status.lastScalingOperation.serversAffected` on the `ScheduledServerScaling` resource for both operations to confirm which servers were removed vs. restored. |
| A schedule didn't run at all            | The operator was down past `missedExecutionWindow` at the cron time                      | Increase `missedExecutionWindow` if operator restarts routinely take longer than the current setting, or trigger the action manually.                                |

## Related

* [Replica Group based Workload Isolation](/corecapabilities/query_data/advanced_operations/replica-group-based-workload-isolation) — background on replica groups and pool-based instance assignment.
* [Cluster Health Dashboard](/corecapabilities/cluster-operations/use-cluster-health-dashboard) — check overall cluster health, including replication and instance-pool checks relevant to scale-down eligibility.
