This feature is available starting in StarTree release 0.14.0. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.
After creating an External table and triggering onboarding (see Onboarding Guide), use these APIs to monitor the onboarding pipeline — trigger runs on demand, check watcher state, and verify checkpoints.
Table of Contents
- Trigger External Table Sync Task
- Get Watcher Status
- Get Checkpoint
API Endpoints Quick Reference
| Method | Endpoint | Purpose |
|---|
POST | /tasks/schedule | Manually trigger an onboarding run |
GET | /tables/{tableNameWithType}/externalTable/status | Get watcher run status |
GET | /tables/{tableNameWithType}/externalTable/checkpoint | Get last ingested checkpoint |
1. Trigger External Table Sync Task
POST /tasks/schedule
After creating a table via API you must manually trigger the first onboarding run. The ExternalTableSyncTask is registered on a cron schedule (e.g. every 30 minutes) but will not fire automatically until the next scheduled window. Use the Schedule API to start data loading immediately — it uses the task config defined in the table configuration.
Request body:
| Field | Required | Description |
|---|
taskType | Yes | Always ExternalTableSyncTask |
tableName | Yes | The Pinot table name, e.g. my_table |
curl -X POST \
"http://localhost:9000/tasks/schedule" \
-H "Content-Type: application/json" \
-d '{"taskType": "ExternalTableSyncTask", "tableName": "<TABLE_NAME>"}'
Success response (200):
{
"ExternalTableSyncTask": "Task_ExternalTableSyncTask_<TIMESTAMP>"
}
Note: After triggering, poll Get Watcher Status to confirm the task moves from RUNNING to COMPLETED. For StarTree Cloud deployments the base URL includes the /api/pinot proxy prefix, e.g. https://<data-plane-host>/api/pinot/tasks/schedule.
2. Get Watcher Status
GET /tables/{tableNameWithType}/externalTable/status
Returns the last run status of the external table sync watcher (ExternalTableSyncWatcher) for the table.
Path parameter: tableNameWithType — table name with type suffix, e.g. <TABLE_NAME>_OFFLINE.
Response fields:
| Field | Type | Description |
|---|
status | string | IDLE | RUNNING | COMPLETED | FAILED |
startTimeMs | long | Run start time in ms. 0 if IDLE. |
endTimeMs | long | Run end time in ms. 0 if RUNNING or IDLE. |
filesDiscovered | int | Files found from the catalog in this run. |
segmentsGenerated | int | Segments successfully created and uploaded. |
errorMessage | string | Populated only on FAILED. |
checkpointValue | string | Snapshot ID or watermark after a successful run. |
Status meanings:
| Status | Meaning |
|---|
IDLE | No run has occurred yet, or the watcher is between scheduled runs. |
RUNNING | A run is currently in progress. |
COMPLETED | Last run succeeded. Check checkpointValue for the ingested snapshot. |
FAILED | Last run failed. Check errorMessage and compare filesDiscovered vs segmentsGenerated to see how far it got. |
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/externalTable/status"
Sample responses:
// COMPLETED
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"status": "COMPLETED",
"startTimeMs": 1707500000000,
"endTimeMs": 1707500060000,
"filesDiscovered": 15,
"segmentsGenerated": 15,
"errorMessage": null,
"checkpointValue": "1234567890123456789"
}
// FAILED
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"status": "FAILED",
"startTimeMs": 1707500000000,
"endTimeMs": 1707500030000,
"filesDiscovered": 10,
"segmentsGenerated": 5,
"errorMessage": "Failed to upload segment: Connection timeout",
"checkpointValue": null
}
// IDLE (no run yet — table was just created)
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"status": "IDLE",
"startTimeMs": 0,
"endTimeMs": 0,
"filesDiscovered": 0,
"segmentsGenerated": 0,
"errorMessage": null,
"checkpointValue": null
}
Error codes: 404 table not found | 500 internal error
The watcher status is stored in ZooKeeper at /EXTERNAL_TABLE_WATCHER_STATUS/{tableNameWithType}. Only the last run is retained — each new run overwrites the previous status.
3. Get Checkpoint
GET /tables/{tableNameWithType}/externalTable/checkpoint
Returns the Iceberg snapshot ID (or timestamp) up to which data has been successfully ingested into Pinot. This is the watermark stored in ZooKeeper after each completed onboarding run.
Requires a top-level catalogType entry in the table’s ExternalTableSyncTask config (catalog-backed onboarding). Raw S3 (s3-catalog) tables return 400.
Response fields:
| Field | Type | Description |
|---|
checkpointType | string | SNAPSHOT_ID or TIMESTAMP. null if no onboarding has completed. |
checkpointValue | string | The snapshot ID or timestamp value. null if no onboarding has completed. |
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/externalTable/checkpoint"
Sample responses:
// After successful onboarding
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"checkpointType": "SNAPSHOT_ID",
"checkpointValue": "1234567890123456789"
}
// No onboarding completed yet
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"checkpointType": null,
"checkpointValue": null
}
Error codes: 400 not a catalog table | 404 table not found | 500 catalog error