Skip to main content
This feature is available starting in StarTree release 0.14.0. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.
After creating an External table and triggering ingestion (see Onboarding Guide), use these APIs to monitor the ingestion pipeline — trigger runs on demand, check watcher state, verify checkpoints, and count files per snapshot.

Table of Contents

  1. Trigger Ingestion Task
  2. Get Watcher Status
  3. Get Checkpoint
  4. Get File Count

API Endpoints Quick Reference

MethodEndpointPurpose
POST/periodictask/runManually trigger an ingestion run
GET/tables/{tableNameWithType}/iceberg/statusGet watcher run status
GET/tables/{tableNameWithType}/iceberg/checkpointGet last ingested checkpoint
GET/tables/{tableNameWithType}/iceberg/filesGet file count for a snapshot

1. Trigger Ingestion Task

POST /periodictask/run After creating a table via API you must manually trigger the first ingestion run. The IcebergIngestionTask is registered on a cron schedule (e.g. every 30 minutes) but will not fire automatically until the next scheduled window. Triggering it manually starts data loading immediately. Query parameters:
ParameterRequiredDescription
tasknameYesAlways IcebergIngestionTask
tableNameYesThe Pinot table name without the type suffix, e.g. my_table
typeYesAlways OFFLINE for External tables
curl -X POST \
  "http://localhost:9000/periodictask/run?taskname=IcebergIngestionTask&tableName=<TABLE_NAME>&type=OFFLINE" \
  -H "Content-Type: application/json" \
  -d ''
Success response (200):
{
  "status": "IcebergIngestionTask triggered for table: <TABLE_NAME>_OFFLINE"
}
Note: After triggering, poll Get Watcher Status to confirm the task moves from RUNNING to COMPLETED. For StarTree Cloud deployments the base URL includes the /api/pinot proxy prefix, e.g. https://<data-plane-host>/api/pinot/periodictask/run.

2. Get Watcher Status

GET /tables/{tableNameWithType}/iceberg/status Returns the last run status of the IcebergWatcher for the table. Path parameter: tableNameWithType — table name with type suffix, e.g. <TABLE_NAME>_OFFLINE. Response fields:
FieldTypeDescription
statusstringIDLE | RUNNING | COMPLETED | FAILED
startTimeMslongRun start time in ms. 0 if IDLE.
endTimeMslongRun end time in ms. 0 if RUNNING or IDLE.
filesDiscoveredintFiles found from the catalog in this run.
segmentsGeneratedintSegments successfully created and uploaded.
errorMessagestringPopulated only on FAILED.
checkpointValuestringSnapshot ID or watermark after a successful run.
Status meanings:
StatusMeaning
IDLENo run has occurred yet, or the watcher is between scheduled runs.
RUNNINGA run is currently in progress.
COMPLETEDLast run succeeded. Check checkpointValue for the ingested snapshot.
FAILEDLast run failed. Check errorMessage and compare filesDiscovered vs segmentsGenerated to see how far it got.
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/status"
Sample responses:
// COMPLETED
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "status": "COMPLETED",
  "startTimeMs": 1707500000000,
  "endTimeMs": 1707500060000,
  "filesDiscovered": 15,
  "segmentsGenerated": 15,
  "errorMessage": null,
  "checkpointValue": "1234567890123456789"
}

// FAILED
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "status": "FAILED",
  "startTimeMs": 1707500000000,
  "endTimeMs": 1707500030000,
  "filesDiscovered": 10,
  "segmentsGenerated": 5,
  "errorMessage": "Failed to upload segment: Connection timeout",
  "checkpointValue": null
}

// IDLE (no run yet — table was just created)
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "status": "IDLE",
  "startTimeMs": 0,
  "endTimeMs": 0,
  "filesDiscovered": 0,
  "segmentsGenerated": 0,
  "errorMessage": null,
  "checkpointValue": null
}
Error codes: 404 table not found | 500 internal error
The watcher status is stored in ZooKeeper at /ICEBERG_WATCHER_STATUS/{tableNameWithType}. Only the last run is retained — each new run overwrites the previous status.

3. Get Checkpoint

GET /tables/{tableNameWithType}/iceberg/checkpoint Returns the Iceberg snapshot ID (or timestamp) up to which data has been successfully ingested into Pinot. This is the watermark stored in ZooKeeper after each completed ingestion run.
Requires iceberg.source.type=catalog in the table’s task config. Raw S3 (s3-catalog) tables return 400.
Response fields:
FieldTypeDescription
checkpointTypestringSNAPSHOT_ID or TIMESTAMP. null if no ingestion has completed.
checkpointValuestringThe snapshot ID or timestamp value. null if no ingestion has completed.
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/checkpoint"
Sample responses:
// After successful ingestion
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "checkpointType": "SNAPSHOT_ID",
  "checkpointValue": "1234567890123456789"
}

// No ingestion completed yet
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "checkpointType": null,
  "checkpointValue": null
}
Error codes: 400 not a catalog table | 404 table not found | 500 catalog error

4. Get File Count

GET /tables/{tableNameWithType}/iceberg/files Queries the Iceberg catalog directly and returns the total number of Parquet data files for a given snapshot. Use this to verify that the catalog has the expected number of files and to compare against what was ingested.
Requires iceberg.source.type=catalog in the table’s task config. Raw S3 (s3-catalog) tables return 400.
Query parameters:
ParameterTypeRequiredDescription
snapshotIdlongNoIceberg snapshot ID. If omitted, the current (latest) snapshot is used.
# Current snapshot
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/files"

# Specific snapshot
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/iceberg/files?snapshotId=1234567890123456789"
Sample responses:
// Specific snapshot
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "snapshotId": 1234567890123456789,
  "fileCount": 42
}

// Current snapshot (snapshotId omitted)
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "snapshotId": null,
  "fileCount": 108
}
Error codes: 400 not a catalog table | 404 table not found | 500 catalog error
File Count vs Checkpoint: /files queries the catalog live and returns all files for a snapshot. /checkpoint returns the last ingested snapshot ID stored in ZooKeeper. If fileCount is greater than expected, new snapshots may have been added to the catalog since the last ingestion run.