This feature is available starting in StarTree release 0.14.0. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.
After creating an External table and triggering ingestion (see Onboarding Guide), use these APIs to monitor the ingestion pipeline — trigger runs on demand, check watcher state, verify checkpoints, and count files per snapshot.
Table of Contents
- Trigger Ingestion Task
- Get Watcher Status
- Get Checkpoint
- Get File Count
API Endpoints Quick Reference
| Method | Endpoint | Purpose |
|---|
POST | /periodictask/run | Manually trigger an ingestion run |
GET | /tables/{tableNameWithType}/iceberg/status | Get watcher run status |
GET | /tables/{tableNameWithType}/iceberg/checkpoint | Get last ingested checkpoint |
GET | /tables/{tableNameWithType}/iceberg/files | Get file count for a snapshot |
1. Trigger Ingestion Task
POST /periodictask/run
After creating a table via API you must manually trigger the first ingestion run. The IcebergIngestionTask is registered on a cron schedule (e.g. every 30 minutes) but will not fire automatically until the next scheduled window. Triggering it manually starts data loading immediately.
Query parameters:
| Parameter | Required | Description |
|---|
taskname | Yes | Always IcebergIngestionTask |
tableName | Yes | The Pinot table name without the type suffix, e.g. my_table |
type | Yes | Always OFFLINE for External tables |
curl -X POST \
"http://localhost:9000/periodictask/run?taskname=IcebergIngestionTask&tableName=<TABLE_NAME>&type=OFFLINE" \
-H "Content-Type: application/json" \
-d ''
Success response (200):
{
"status": "IcebergIngestionTask triggered for table: <TABLE_NAME>_OFFLINE"
}
Note: After triggering, poll Get Watcher Status to confirm the task moves from RUNNING to COMPLETED. For StarTree Cloud deployments the base URL includes the /api/pinot proxy prefix, e.g. https://<data-plane-host>/api/pinot/periodictask/run.
2. Get Watcher Status
GET /tables/{tableNameWithType}/iceberg/status
Returns the last run status of the IcebergWatcher for the table.
Path parameter: tableNameWithType — table name with type suffix, e.g. <TABLE_NAME>_OFFLINE.
Response fields:
| Field | Type | Description |
|---|
status | string | IDLE | RUNNING | COMPLETED | FAILED |
startTimeMs | long | Run start time in ms. 0 if IDLE. |
endTimeMs | long | Run end time in ms. 0 if RUNNING or IDLE. |
filesDiscovered | int | Files found from the catalog in this run. |
segmentsGenerated | int | Segments successfully created and uploaded. |
errorMessage | string | Populated only on FAILED. |
checkpointValue | string | Snapshot ID or watermark after a successful run. |
Status meanings:
| Status | Meaning |
|---|
IDLE | No run has occurred yet, or the watcher is between scheduled runs. |
RUNNING | A run is currently in progress. |
COMPLETED | Last run succeeded. Check checkpointValue for the ingested snapshot. |
FAILED | Last run failed. Check errorMessage and compare filesDiscovered vs segmentsGenerated to see how far it got. |
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/status"
Sample responses:
// COMPLETED
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"status": "COMPLETED",
"startTimeMs": 1707500000000,
"endTimeMs": 1707500060000,
"filesDiscovered": 15,
"segmentsGenerated": 15,
"errorMessage": null,
"checkpointValue": "1234567890123456789"
}
// FAILED
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"status": "FAILED",
"startTimeMs": 1707500000000,
"endTimeMs": 1707500030000,
"filesDiscovered": 10,
"segmentsGenerated": 5,
"errorMessage": "Failed to upload segment: Connection timeout",
"checkpointValue": null
}
// IDLE (no run yet — table was just created)
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"status": "IDLE",
"startTimeMs": 0,
"endTimeMs": 0,
"filesDiscovered": 0,
"segmentsGenerated": 0,
"errorMessage": null,
"checkpointValue": null
}
Error codes: 404 table not found | 500 internal error
The watcher status is stored in ZooKeeper at /ICEBERG_WATCHER_STATUS/{tableNameWithType}. Only the last run is retained — each new run overwrites the previous status.
3. Get Checkpoint
GET /tables/{tableNameWithType}/iceberg/checkpoint
Returns the Iceberg snapshot ID (or timestamp) up to which data has been successfully ingested into Pinot. This is the watermark stored in ZooKeeper after each completed ingestion run.
Requires iceberg.source.type=catalog in the table’s task config. Raw S3 (s3-catalog) tables return 400.
Response fields:
| Field | Type | Description |
|---|
checkpointType | string | SNAPSHOT_ID or TIMESTAMP. null if no ingestion has completed. |
checkpointValue | string | The snapshot ID or timestamp value. null if no ingestion has completed. |
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/checkpoint"
Sample responses:
// After successful ingestion
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"checkpointType": "SNAPSHOT_ID",
"checkpointValue": "1234567890123456789"
}
// No ingestion completed yet
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"checkpointType": null,
"checkpointValue": null
}
Error codes: 400 not a catalog table | 404 table not found | 500 catalog error
4. Get File Count
GET /tables/{tableNameWithType}/iceberg/files
Queries the Iceberg catalog directly and returns the total number of Parquet data files for a given snapshot. Use this to verify that the catalog has the expected number of files and to compare against what was ingested.
Requires iceberg.source.type=catalog in the table’s task config. Raw S3 (s3-catalog) tables return 400.
Query parameters:
| Parameter | Type | Required | Description |
|---|
snapshotId | long | No | Iceberg snapshot ID. If omitted, the current (latest) snapshot is used. |
# Current snapshot
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/files"
# Specific snapshot
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/files?snapshotId=1234567890123456789"
Sample responses:
// Specific snapshot
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"snapshotId": 1234567890123456789,
"fileCount": 42
}
// Current snapshot (snapshotId omitted)
{
"tableNameWithType": "<TABLE_NAME>_OFFLINE",
"snapshotId": null,
"fileCount": 108
}
Error codes: 400 not a catalog table | 404 table not found | 500 catalog error
File Count vs Checkpoint: /files queries the catalog live and returns all files for a snapshot. /checkpoint returns the last ingested snapshot ID stored in ZooKeeper. If fileCount is greater than expected, new snapshots may have been added to the catalog since the last ingestion run.