Overview
Some Minion tasks — such as large ingestion jobs, purges, or exports — process too much data to finish in a single run. They’re broken up into multiple batches, and today each batch normally requires a separate trigger: either you call the task API again, or you wait for the next cron run.
Minion Task Orchestration removes that manual step. When enabled for a task, StarTree Cloud tracks the multi-batch job as a task plan: a single trigger creates the plan, and the controller automatically generates and submits each subsequent batch as soon as the previous one finishes, until there’s no more work left to do.
A task plan is a long-running orchestration for one table + task type combination. Only one active plan is allowed per table and task type at a time, and the plan tracks overall status, its batches, and progress until it reaches a terminal state (COMPLETED, CANCELLED, or FAILED).
Task orchestration is supported for both the ad hoc Execute API and scheduled (cron) task triggers.
How is this different from a normal task run?
| Aspect | Normal task run | Orchestrated task run |
|---|
| Trigger | Each run is independent (ad hoc API call or cron). | One trigger creates a plan; later batches are driven automatically by batch completion. |
| Scope | One invocation generates one batch of subtasks. | One plan spans multiple batches over time. |
| Next batch | You must trigger again, or wait for the next cron run. | The controller automatically submits the next batch when the current one finishes. |
| Progress tracking | Only the Helix task state is available. | A task plan tracks status, batch history, and progress, queryable via API. |
| Concurrency | Multiple runs for the same table/task can overlap. | At most one active plan per table and task type; new triggers are rejected or skipped while a plan is active. |
In short: a normal run produces one batch per trigger, while an orchestrated run produces a chain of batches from a single trigger, running until the job is done.
Enabling task orchestration
Task orchestration requires the feature to be enabled at the cluster level, and then opted into per task.
Cluster-level control
Task orchestration is enabled by default at the cluster level. It can be turned off entirely with the controller configuration property below, which acts as a cluster-wide kill switch — when disabled, no task plans are created or progressed, regardless of any per-task setting.
| Property | Default | Description |
|---|
controller.startree.task.manager.enableTaskOrchestration | true | Enables the orchestration infrastructure cluster-wide. |
Enabling this property does not by itself orchestrate any task — each task must still opt in individually, as described below.
Enabling for an ad hoc trigger
Add enableTaskOrchestration to the task configuration when calling the Execute API:
{
"taskType": "SegmentPurgeTask",
"tableName": "myTable_OFFLINE",
"taskConfigs": {
"enableTaskOrchestration": "true"
}
}
- Only task types that support orchestration will use this path; other task types ignore the flag and run as before.
- If a plan is already active for that table and task type, the ad hoc request is rejected until the existing plan completes or is aborted.
Enabling for a scheduled (cron) trigger
Add enableTaskOrchestration alongside the schedule key in the table configuration:
{
"task": {
"taskTypeConfigsMap": {
"SegmentPurgeTask": {
"schedule": "0 0 2 * * ?",
"enableTaskOrchestration": "true"
}
}
}
}
When a cron trigger fires and orchestration is enabled, the controller creates a task plan from the table configuration and begins multi-batch orchestration automatically.
If a scheduled trigger fires while a plan is already active for that table and task type, the trigger is skipped (not rejected) and the existing plan continues unaffected. This is different from the ad hoc path, which rejects the request outright.
If the task’s generator doesn’t support orchestration, the trigger automatically falls back to the normal, one-shot batch generation.
Disabling orchestration for a single task type
To disable orchestration for one problematic task type without touching the cluster-wide switch, set forceLegacyTaskFlow in that task type’s configuration:
{
"task": {
"taskTypeConfigsMap": {
"SegmentPurgeTask": {
"schedule": "0 0 2 * * ?",
"forceLegacyTaskFlow": "true"
}
}
}
}
forceLegacyTaskFlow takes precedence over enableTaskOrchestration — if both are true, the legacy flow is used.
- It only stops new plans from being created. A plan that’s already active for that table and task type is left to finish on its own; it isn’t aborted.
- If a single scheduled trigger covers multiple tables for the same task type, setting
forceLegacyTaskFlow on any of them routes the whole trigger cycle to the legacy flow.
Some task types (for example StarTreeAlterTableTask) only support the orchestrated flow and reject ad hoc generation without it. Forcing the legacy flow for such a task type is an explicit choice that may cause task generation to fail — only use it if you understand the tradeoff.
Supported task types
Orchestration is available only for task types whose generator implements batch-by-batch generation. Currently supported task types:
| Task type | Description |
|---|
| File Ingestion Task | Supports all ingestion modes (sync/append, with or without consistent push) with automatic batch chaining and retries. Also supports consistent push with full-swap mode, completing a swap reliably across retries while avoiding accidental duplicate swaps. |
| Segment Purge Task | Chains batches automatically for large purge jobs that span more segments than a single batch can process. |
| Data Export Task | Exports completed segments from a source real-time table to an external target. Segments beyond the per-batch limit, and any pending commits, automatically flow into subsequent batches until the export queue is fully drained. |
For any other task type, enabling enableTaskOrchestration has no effect — the task always uses the standard, one-shot generation flow.
Monitoring task plans via API
When available, the controller exposes REST endpoints for inspecting and managing task plans. All endpoints require the same authentication and table-level authorization as other task APIs.
| Method | Path | Description |
|---|
GET | /tasks/taskPlans/isActive/{taskPlanId} | Check whether a task plan is currently active. |
GET | /tasks/taskPlans/{planId} | Get the full details of a task plan by ID. |
GET | /tasks/taskPlans | List all task plan IDs for a table and task type. Requires tableNameWithType and taskType query parameters. |
GET | /tasks/taskPlans/active | List all active task plans, optionally filtered by tableNameWithType and/or taskType. |
GET | /tasks/taskPlans/active/ids | List active task plan IDs only, with the same optional filters. |
DELETE | /tasks/taskPlans/{planId} | Abort a task plan. Sets its status to ABORTING; no new batches are generated, though any already-running batch may still complete. Poll the isActive endpoint to confirm the plan has fully stopped. |
Task plan data model
Each task plan returned by the API contains the following fields:
| Field | Type | Description |
|---|
planId | String | Unique ID, formatted as <tableNameWithType>__<taskType>__<uuid>. |
tableNameWithType | String | The table this plan belongs to (e.g. myTable_OFFLINE). |
taskType | String | The task type this plan orchestrates (e.g. SegmentPurgeTask). |
status | Enum | One of ACTIVE, COMPLETED, CANCELLED, ABORTING, FAILED. |
source | String | How the plan was created: ADHOC_CONFIG or TABLE_CONFIG. |
properties | Map<String, String> | Plan-level configuration used to generate each batch. |
batches | List | Ordered list of batch records submitted so far (see below). |
inputsToProcess | Long | Total number of input units (e.g. segments) to process. |
inputsBeingProcessed | Long | Number of input units currently in flight. |
inputUnit | String | The unit the counts above are measured in (e.g. "segments"). |
statusMessage | String | Human-readable status, such as the reason for an abort or completion. |
customStats | Map<String, String> | Optional, task-type-specific statistics. |
Each entry in batches includes:
| Field | Type | Description |
|---|
batchSequence | Integer | 0-based sequence number of the batch. |
submittedAtMs | Long | Time (epoch ms) the batch was submitted. |
submittedTaskName | String | Name of the parent task submitted for this batch. |
taskState | Enum | Current state of that batch’s parent task (e.g. NOT_STARTED, IN_PROGRESS, COMPLETED). |
Task plan cleanup
Task plans are cleaned up automatically when their table is dropped, so deleted tables don’t leave orphaned plans behind:
- Dropping a table synchronously removes all of its task plans (across every task type) before the table’s metadata is torn down.
- A periodic background sweep also reaps any plans whose table no longer exists, catching cases the synchronous cleanup may have missed. This sweep runs on an interval controlled by:
| Property | Default | Description |
|---|
controller.startree.task.manager.orphanedTaskPlanCleanupIntervalInSeconds | 28800 (8 hours) | How often the controller sweeps for and removes plans belonging to deleted tables. |
Observability and metrics
Task orchestration emits controller-side metrics scoped to tableNameWithType and taskType, so you can monitor and alert on orchestration health. Global gauges are emitted per controller — since only the controller that leads a table progresses its plans, aggregate global metrics across all controllers when building dashboards.
Meters
| Metric | Meaning |
|---|
taskPlanAborted | A plan was aborted, either due to a failure-threshold breach or a generator-driven abort. |
orchestrationCycleFailure | An exception occurred while progressing a plan. |
taskPlanProgressionBlocked | Plan progression was skipped because the task queue is paused, or resource-utilization limits were hit. Stays set while blocked and clears once progression can resume. |
scheduledTriggerActivePlanConflict | A scheduled trigger was skipped because a plan was already active for that table and task type. |
taskGenerationFailureCount | Task generation failed for the table/task type. |
Gauges
| Metric | Meaning |
|---|
taskPlanInputsToProcess, taskPlanInputsBeingProcessed, taskPlanInputsProcessed | Plan progress: total, in-flight, and cumulative-completed input units. |
activeTaskPlansCount | Number of ACTIVE plans this controller is currently progressing. |
taskPlansAbortingCount | Number of plans stuck in ABORTING, waiting for in-flight subtasks to terminate. |
orchestrationTimeSinceLastPollMs | Time since the plan-completion polling job last ran. Rising continuously would indicate the polling job has stalled. |
FAQs
Do I need to change anything for task types that don’t support orchestration?
No. Setting enableTaskOrchestration on an unsupported task type has no effect — it continues to use the standard one-shot generation flow.
What happens if I trigger an ad hoc run while a plan is already active?
The request is rejected with an error. Wait for the active plan to complete, or abort it using the DELETE /tasks/taskPlans/{planId} endpoint, before triggering again.
What happens if a scheduled (cron) trigger fires while a plan is active?
Unlike the ad hoc path, the scheduled trigger is silently skipped rather than rejected, and the scheduledTriggerActivePlanConflict metric is incremented. The existing plan is unaffected and continues to progress.
How do I stop an in-progress plan?
Call DELETE /tasks/taskPlans/{planId}. This moves the plan to ABORTING; any batch already running is allowed to finish, but no new batch is generated. Poll GET /tasks/taskPlans/isActive/{planId} until it reports the plan is no longer active.
How can I tell whether a task type supports orchestration?
Check the supported task types table. If a task type isn’t listed there, enabling enableTaskOrchestration for it has no effect.