Managing Adhoc Minion Tasks Programmatically

Pinot provides two primary APIs to programmatically trigger ad hoc Minion tasks. Choosing the right API and strategy depends on your task type, data source characteristics, and runtime requirements.

Note: Schedule API uses the task config defined in the table config, whereas Execute API relies on the API payload. Other than this, there is no difference between these two APIs.

Schedule API

API Reference: /tasks/schedule Behavior: This API schedules tasks using parameters defined under the table’s task configs. Pinot internally determines eligible segments, generates subtasks accordingly, and executes them immediately. When to Use:

When a scheduled task needs to be run immediately instead of waiting for its next schedule.
If periodic execution is not required or the task should only be externally triggered (e.g., by Airflow), remove the schedule parameter in the task config.
Recommended for tasks predefined in table config like:
- Delta Ingestion Task
- File Ingestion Task
- SQL Ingestion Task
- Segment Backfill Task

Caveats:

Ensure tableMaxNumTasks = -1 so all sub-tasks are created within a single task.
No built-in retry/recovery mechanism. Tasks must be re-triggered manually if failures occur.

Execute API

API Reference: /tasks/execute Behavior: Executes a task using runtime-supplied configuration (API payload), giving full control over parameters like input paths, filters, and processing windows. When to Use:

Tasks that require dynamic parameters (input sources, delta ranges, etc.).
Ideal for external orchestration (e.g., Airflow) of the following:
- Delta Ingestion Task
- File Ingestion Task
- SQL Ingestion Task
- Segment Backfill Task

Caveats:

No built-in retry/recovery mechanism. Tasks must be re-triggered manually if failures occur.
Avoid using this API for scheduled maintenance tasks like Segment Import, Segment Refresh, or Alter Table Tasks (use cron-based schedules instead).

File Ingestion Task

The File Ingestion Task is used to ingest data from external sources (e.g., S3, GCS, ADLS). It operates in two main modes - Append Mode and Sync Mode. Each mode also supports Consistent Push Mode and Consistent Push Full Swap Mode for atomic ingestion.

Mode Overview

Mode	Consistent Push	Consistent Push Swap	Behavior	Segment to File Mapping
Sync Mode	false	false	No atomicity is guaranteed within the batch	1:1
Sync Mode	true	false	Atomic batch ingestion, it is used for incremental ingestion.	1:1
Sync Mode	false	true	Full table refresh, atomic per run	1:1
Append Mode	true	false	Atomic batch ingestion, it is used for incremental ingestion.	m:n
Append Mode	false	true	Full table refresh, atomic per run	m:n

Scaling and Stability Recommendations

Parameter	Description	Recommended Setting
`tableMaxNumTasks`	Max concurrent subtasks per table	1000 (adjust per controller capacity)
`taskMaxDataSize`	Max data processed per subtask	1–2 GB (default: 1 GB)
`taskMaxNumFiles`	Max files per subtask	Increase for small-file datasets
`desiredSegmentSize`	Target Pinot segment size	500 MB – 1 GB

Recommendations

1. Large Ad-hoc Ingestion (~2TB, `consistentPushSwapEnabled=false`)

Append Mode: If taskMaxDataSize = 1GB and tableMaxNumTasks = 1000, a single batch can ingest 1TB. Run twice to complete 2TB ingestion.
Sync Mode: If each file = 400 MB, with tableMaxNumTasks = 1000, total = 400 GB per batch → 5 batches required.
Tip: Re-trigger /tasks/schedule or /tasks/execute in a loop until no subtasks are created. (No ingestion will take place unless there are new files).

2. Full Table Refresh (`consistentPushSwapEnabled=true`)

Refresh the whole table with new data.
Ensures atomic ingestion for all or none.
Atomicity applies only within a single task execution, not across batches.
Set tableMaxNumTasks = -1 to generate all subtasks in one batch.

3. Atomic Table Ingestion (`consistentPushEnabled=true`)

Similar atomic behavior as above; atomic only within one task execution.
Set tableMaxNumTasks = -1 to generate all subtasks in one batch.

4. Rerun on Failure

Use the Debug API to verify task status.
If subtasks show FAILED or TIMED_OUT, re-trigger with the same config.
Once all subtasks are COMPLETED, reruns have no effect (idempotent).

Delta Ingestion Task

Generates 1:1 mapping of Delta files to Pinot segments.
Safe to rerun; each execution is independent.
Supports both Schedule and Execute ad hoc APIs.

Rerun Recommendation:

Check via Debug API.
If subtasks show FAILED or TIMED_OUT, re-trigger.
Once all are COMPLETED, rerun has no effect.

Segment Backfill Task

Executable only on-demand (cannot be scheduled).
Does not support the schedule parameter in task config.
Avoid re-triggering successful runs — repeated runs will re-ingest the same data.

Rerun Recommendation:

Check via Debug API.
If subtasks show FAILED or TIMED_OUT, re-trigger.
Once all are COMPLETED, rerun has no effect.

Segment Import Task

Recommended frequency: every 15 minutes.
Should be scheduled via the schedule parameter in task config.
It is not recommended to trigger using ad hoc schedule or execute APIs.

Segment Refresh Task / Alter Table Task

Recommended frequency: every 15 minutes.
Should be scheduled via the schedule parameter in task config.
It is not recommended to trigger using ad hoc schedule or execute APIs.
Use the Progress Tracker API to monitor execution status for Alter Table Task.

Get Started

Ingestion

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Managing Adhoc Minion Tasks Programmatically

Schedule API

Execute API

File Ingestion Task

Mode Overview

Scaling and Stability Recommendations

Recommendations

1. Large Ad-hoc Ingestion (~2TB, `consistentPushSwapEnabled=false`)

2. Full Table Refresh (`consistentPushSwapEnabled=true`)

3. Atomic Table Ingestion (`consistentPushEnabled=true`)

4. Rerun on Failure

Delta Ingestion Task

Segment Backfill Task

Segment Import Task

Segment Refresh Task / Alter Table Task

Get Started

Ingestion

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

​Schedule API

​Execute API

​File Ingestion Task

​Mode Overview

​Scaling and Stability Recommendations

​Recommendations

​1. Large Ad-hoc Ingestion (~2TB, consistentPushSwapEnabled=false)

​2. Full Table Refresh (consistentPushSwapEnabled=true)

​3. Atomic Table Ingestion (consistentPushEnabled=true)

​4. Rerun on Failure

​Delta Ingestion Task

​Segment Backfill Task

​Segment Import Task

​Segment Refresh Task / Alter Table Task

Schedule API

Execute API

File Ingestion Task

Mode Overview

Scaling and Stability Recommendations

Recommendations

1. Large Ad-hoc Ingestion (~2TB, `consistentPushSwapEnabled=false`)

2. Full Table Refresh (`consistentPushSwapEnabled=true`)

3. Atomic Table Ingestion (`consistentPushEnabled=true`)

4. Rerun on Failure

Delta Ingestion Task

Segment Backfill Task

Segment Import Task

Segment Refresh Task / Alter Table Task