Observability

This feature is available starting in StarTree release 0.14.0. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.

After creating an External table and triggering ingestion (see Onboarding Guide), use these APIs to monitor the ingestion pipeline — trigger runs on demand, check watcher state, verify checkpoints, and count files per snapshot.

Trigger Ingestion Task
Get Watcher Status
Get Checkpoint
Get File Count

API Endpoints Quick Reference

Method	Endpoint	Purpose
`POST`	`/periodictask/run`	Manually trigger an ingestion run
`GET`	`/tables/{tableNameWithType}/externalTable/status`	Get watcher run status
`GET`	`/tables/{tableNameWithType}/externalTable/checkpoint`	Get last ingested checkpoint
`GET`	`/tables/{tableNameWithType}/externalTable/files`	Get file count for a snapshot

1. Trigger Ingestion Task

POST /periodictask/run After creating a table via API you must manually trigger the first ingestion run. The ExternalTableSyncTask is registered on a cron schedule (e.g. every 30 minutes) but will not fire automatically until the next scheduled window. Triggering it manually starts data loading immediately. Query parameters:

Parameter	Required	Description
`taskname`	Yes	Always `ExternalTableSyncTask`
`tableName`	Yes	The Pinot table name without the type suffix, e.g. `my_table`
`type`	Yes	Always `OFFLINE` for External tables

curl -X POST \
  "http://localhost:9000/periodictask/run?taskname=ExternalTableSyncTask&tableName=<TABLE_NAME>&type=OFFLINE" \
  -H "Content-Type: application/json" \
  -d ''

Success response (200):

{
  "status": "ExternalTableSyncTask triggered for table: <TABLE_NAME>_OFFLINE"
}

Note: After triggering, poll Get Watcher Status to confirm the task moves from RUNNING to COMPLETED. For StarTree Cloud deployments the base URL includes the /api/pinot proxy prefix, e.g. https://<data-plane-host>/api/pinot/periodictask/run.

2. Get Watcher Status

GET /tables/{tableNameWithType}/externalTable/status Returns the last run status of the external table sync watcher (ExternalTableSyncWatcher) for the table. Path parameter: tableNameWithType — table name with type suffix, e.g. <TABLE_NAME>_OFFLINE. Response fields:

Field	Type	Description
`status`	string	`IDLE` \| `RUNNING` \| `COMPLETED` \| `FAILED`
`startTimeMs`	long	Run start time in ms. `0` if IDLE.
`endTimeMs`	long	Run end time in ms. `0` if RUNNING or IDLE.
`filesDiscovered`	int	Files found from the catalog in this run.
`segmentsGenerated`	int	Segments successfully created and uploaded.
`errorMessage`	string	Populated only on `FAILED`.
`checkpointValue`	string	Snapshot ID or watermark after a successful run.

Status meanings:

Status	Meaning
`IDLE`	No run has occurred yet, or the watcher is between scheduled runs.
`RUNNING`	A run is currently in progress.
`COMPLETED`	Last run succeeded. Check `checkpointValue` for the ingested snapshot.
`FAILED`	Last run failed. Check `errorMessage` and compare `filesDiscovered` vs `segmentsGenerated` to see how far it got.

curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/externalTable/status"

Sample responses:

// COMPLETED
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "status": "COMPLETED",
  "startTimeMs": 1707500000000,
  "endTimeMs": 1707500060000,
  "filesDiscovered": 15,
  "segmentsGenerated": 15,
  "errorMessage": null,
  "checkpointValue": "1234567890123456789"
}

// FAILED
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "status": "FAILED",
  "startTimeMs": 1707500000000,
  "endTimeMs": 1707500030000,
  "filesDiscovered": 10,
  "segmentsGenerated": 5,
  "errorMessage": "Failed to upload segment: Connection timeout",
  "checkpointValue": null
}

// IDLE (no run yet — table was just created)
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "status": "IDLE",
  "startTimeMs": 0,
  "endTimeMs": 0,
  "filesDiscovered": 0,
  "segmentsGenerated": 0,
  "errorMessage": null,
  "checkpointValue": null
}

Error codes: 404 table not found | 500 internal error

The watcher status is stored in ZooKeeper at /EXTERNAL_TABLE_WATCHER_STATUS/{tableNameWithType}. Only the last run is retained — each new run overwrites the previous status.

3. Get Checkpoint

GET /tables/{tableNameWithType}/externalTable/checkpoint Returns the Iceberg snapshot ID (or timestamp) up to which data has been successfully ingested into Pinot. This is the watermark stored in ZooKeeper after each completed ingestion run.

Requires a top-level catalogType entry in the table’s ExternalTableSyncTask config (catalog-backed ingestion). Raw S3 (s3-catalog) tables return 400.

Response fields:

Field	Type	Description
`checkpointType`	string	`SNAPSHOT_ID` or `TIMESTAMP`. `null` if no ingestion has completed.
`checkpointValue`	string	The snapshot ID or timestamp value. `null` if no ingestion has completed.

curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/externalTable/checkpoint"

Sample responses:

// After successful ingestion
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "checkpointType": "SNAPSHOT_ID",
  "checkpointValue": "1234567890123456789"
}

// No ingestion completed yet
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "checkpointType": null,
  "checkpointValue": null
}

Error codes: 400 not a catalog table | 404 table not found | 500 catalog error

4. Get File Count

GET /tables/{tableNameWithType}/externalTable/files Queries the Iceberg catalog directly and returns the total number of Parquet data files for a given snapshot. Use this to verify that the catalog has the expected number of files and to compare against what was ingested.

Requires a top-level catalogType entry in the table’s ExternalTableSyncTask config (catalog-backed ingestion). Raw S3 (s3-catalog) tables return 400.

Query parameters:

Parameter	Type	Required	Description
`snapshotId`	long	No	Iceberg snapshot ID. If omitted, the current (latest) snapshot is used.

# Current snapshot
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/externalTable/files"

# Specific snapshot
curl -X GET "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE/externalTable/files?snapshotId=1234567890123456789"

Sample responses:

// Specific snapshot
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "snapshotId": 1234567890123456789,
  "fileCount": 42
}

// Current snapshot (snapshotId omitted)
{
  "tableNameWithType": "<TABLE_NAME>_OFFLINE",
  "snapshotId": null,
  "fileCount": 108
}

Error codes: 400 not a catalog table | 404 table not found | 500 catalog error

File Count vs Checkpoint: .../externalTable/files queries the catalog live and returns all files for a snapshot. .../externalTable/checkpoint returns the last ingested snapshot ID stored in ZooKeeper. If fileCount is greater than expected, new snapshots may have been added to the catalog since the last ingestion run.

Get Started

Ingestion

StarTree Iceberg/S3 Tables

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Table of Contents

API Endpoints Quick Reference

1. Trigger Ingestion Task

2. Get Watcher Status

3. Get Checkpoint

4. Get File Count

Get Started

Ingestion

StarTree Iceberg/S3 Tables

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Documentation Index

​Table of Contents

​API Endpoints Quick Reference

​1. Trigger Ingestion Task

​2. Get Watcher Status

​3. Get Checkpoint

​4. Get File Count

Table of Contents

API Endpoints Quick Reference

1. Trigger Ingestion Task

2. Get Watcher Status

3. Get Checkpoint

4. Get File Count