Onboarding via API - StarTree Docs

This feature is available starting in StarTree release 0.14.0. Ensure your environment is running version 0.14.0 or later before following this guide.

This guide explains how to use the Pinot Controller’s external table catalog APIs to connect to a catalog, explore its tables, and register a Pinot schema and OFFLINE table with automated ingestion. Catalog discovery uses /externalTable/catalog/*; table creation uses the standard Controller POST /schemas and POST /tables APIs. Each section covers one catalog provider. Within each section, discovery calls use exact JSON bodies; the create sections document a reference payload for schema and table authoring.

Prefer a point-and-click setup? See the Data Portal Onboarding Guide.

How It Works

Onboarding an external table follows a linear discovery-to-ingestion workflow:

Validate your catalog credentials and connectivity.
Discover available namespaces and tables in the catalog.
Create the Pinot schema and table — this also registers the ExternalTableSyncTask on a cron schedule.
Trigger the first ingestion run manually, since the scheduled task does not fire immediately after table creation.

Once ingestion is running, use the Observability APIs to monitor progress, verify checkpoints, and check file counts.

API Endpoints Quick Reference

Step	Method	Endpoint	Purpose
1	`POST`	`/externalTable/catalog/validate`	Validate catalog connection
2	`POST`	`/externalTable/catalog/namespaces`	List namespaces
3	`POST`	`/externalTable/catalog/tables/list`	List tables in a namespace
4a	`POST`	`/schemas`	Register Pinot schema (standard Controller API)
4b	`POST`	`/tables`	Create OFFLINE table with `ExternalTableSyncTask` in `task.taskTypeConfigsMap` (standard Controller API)
5	`POST`	`/periodictask/run`	Manually trigger first ingestion run

For monitoring and observability APIs, see the Observability page.

Recommended Workflow

Run these steps in order when onboarding a new External table:

Step 1 ──► POST /externalTable/catalog/validate
           Confirm credentials and catalog connectivity.

Step 2 ──► POST /externalTable/catalog/namespaces
           Discover available namespaces (databases).

Step 3 ──► POST /externalTable/catalog/tables/list
           List tables in the target namespace.

Step 4 ──► POST /schemas  then  POST /tables
           Register the Pinot schema, then create the OFFLINE table (includes
           ExternalTableSyncTask schedule). You can use POST /tables/preview
           to refine config before applying.

Step 5 ──► POST /tasks/schedule 
            {"taskType" : "ExternalTableSyncTask", "tableName":"<TABLE_NAME>"}
           ⚠️  Manually trigger the first ingestion run if the table is created via API.
           The scheduled task will not fire immediately after table creation —
           you must trigger it to start loading data without waiting for
           the next cron window (up to 30 minutes).

Once the table is created and ingestion is running, use the Observability APIs to monitor progress, verify checkpoints, and check file counts.

Pausing Ingestion

To stop the ExternalTableSyncTask from running on its cron schedule, set enabled to false in the task config and update the table via the Pinot Controller API.

curl -X PUT \
  "http://localhost:9000/tables/<TABLE_NAME>_OFFLINE" \
  -H "Content-Type: application/json" \
  -d '{
    ...tableConfig...,
    "task": {
      "taskTypeConfigsMap": {
        "ExternalTableSyncTask": {
          "enabled": "false",
          "schedule": "0 */5 * * * ?",
          ...
        }
      }
    }
  }'

Setting "enabled": "false" prevents the scheduler from creating new ingestion task instances. Any run currently in progress will complete normally. To resume ingestion, set "enabled": "true" and update the table again.

Note: Pausing does not delete existing segments or checkpoints. When you re-enable the task, ingestion resumes from the last recorded checkpoint — no data is re-ingested.

Catalog Providers

1. S3 Data Lake

What it is: For raw Parquet files on S3 that are not managed by an Iceberg catalog service. There is no catalog REST endpoint — Pinot reads files directly from the specified S3 bucket and prefix. The namespace and table discovery APIs still work but return values derived from the S3 path. catalogType: "s3-catalog"

Note: Unlike the other catalog types, this provider uses accessKey / secretKey (not restAccessKeyId / restSecretAccessKey) for credentials.

1.1 Validate

POST /externalTable/catalog/validate

{
  "catalogType": "s3-catalog",
  "catalogConfig": {
    "bucketName": "<YOUR_S3_BUCKET>",
    "prefix": "path/to/parquet/data/",
    "region": "<REGION>",
    "accessKey": "<YOUR_ACCESS_KEY_ID>",
  "secretKey": "<YOUR_SECRET_ACCESS_KEY>"
  }
}

1.2 Create Pinot Table

Register the logical table with POST /schemas (Pinot schema JSON) followed by POST /tables (body is the tableConfig object in the JSON below). Use POST /tables/preview for an enriched draft when needed.

{
  "catalogType": "s3-catalog",
  "catalogConfig": {
    "namespace": "default",
    "tableName": "raws3_table",
    "bucketName": "<YOUR_S3_BUCKET>",
    "prefix": "path/to/parquet/data/",
    "region": "<REGION>",
    "accessKey": "<YOUR_ACCESS_KEY_ID>",
    "secretKey": "<YOUR_SECRET_ACCESS_KEY>"
  },
  "schemaOptions": {
    "includePartitionColumns": true,
    "timestampColumn": null,
    "timeUnit": "MILLISECONDS",
    "schemaName": "raws3_table"
  },
  "tableConfig": {
    "tableName": "raws3_table_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeColumnName": "logical_timestamp",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "3600",
      "replication": "1",
      "segmentPushType": "APPEND"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "task": {
      "taskTypeConfigsMap": {
        "ExternalTableSyncTask": {
          "schedule": "*/30 * * * * ?",
          "executor": "controller",
          "inputFormat": "parquet",
          "catalogType": "s3",
          "catalog.s3.table.tableName": "raws3_table",
          "catalog.s3.bucketName": "<YOUR_S3_BUCKET>",
          "catalog.s3.prefix": "path/to/parquet/data/",
          "catalog.s3.region": "<REGION>",
          "catalog.s3.accessKey": "<YOUR_ACCESS_KEY_ID>",
          "catalog.s3.secretKey": "<YOUR_SECRET_ACCESS_KEY>"
        }
      }
    },
    "tierConfigs": [
      {
        "name": "myS3Tier",
        "segmentSelectorType": "time",
        "segmentAge": "0s",
        "storageType": "pinot_server",
        "serverTag": "DefaultTenant_OFFLINE",
        "tierBackend": "s3",
        "tierBackendProperties": {
          "enable.delegate.v2": "true",
          "region": "<REGION>",
          "bucket": "<YOUR_S3_BUCKET>",
          "accessKey": "<YOUR_ACCESS_KEY_ID>",
          "secretKey": "<YOUR_SECRET_ACCESS_KEY>",
          "sessionToken": "",
          "preload.index.keys.override": "*.inverted_index,*.composite_json_index",
          "s3client.crtasync.targetThroughputInGbps": "1000.0",
          "s3client.crtasync.maxConcurrency": "1000"
        }
      }
    ],
    "isDimTable": false
  }
}

2. Glue REST

What it is: Connects to AWS Glue as an Iceberg catalog using the Iceberg REST protocol. Uses AWS SigV4 for both the Glue catalog API (rest* fields) and S3 data access (storage* fields). catalogType: "iceberg-rest" with "serviceType": "glue"

2.1 Validate

POST /externalTable/catalog/validate

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "restUri": "https://glue.<REGION>.amazonaws.com",
    "serviceType": "glue",
    "warehouse": "<YOUR_AWS_ACCOUNT_ID>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

2.2 List Namespaces

POST /externalTable/catalog/namespaces

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>",
    "restUri": "https://glue.<REGION>.amazonaws.com",
    "serviceType": "glue",
    "warehouse": "<YOUR_AWS_ACCOUNT_ID>"
  }
}

2.3 List Tables

POST /externalTable/catalog/tables/list

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "restUri": "https://glue.<REGION>.amazonaws.com",
    "serviceType": "glue",
    "warehouse": "<YOUR_AWS_ACCOUNT_ID>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "namespace": "st-database"
}

namespace — the Glue database name to list tables from.

2.4 Create Pinot Table

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "namespace": "st-database",
    "tableName": "glue_iceberg_table_wiki",
    "restUri": "https://glue.<REGION>.amazonaws.com",
    "serviceType": "glue",
    "warehouse": "<YOUR_AWS_ACCOUNT_ID>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "schemaOptions": {
    "includePartitionColumns": true,
    "timestampColumn": null,
    "timeUnit": "MILLISECONDS",
    "schemaName": "glue_rest_table"
  },
  "tableConfig": {
    "tableName": "glue_rest_table_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeColumnName": null,
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "365",
      "replication": "1",
      "segmentPushType": "APPEND"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "task": {
      "taskTypeConfigsMap": {
        "ExternalTableSyncTask": {
          "schedule": "0 */30 * * * ?",
          "executor": "controller",
          "inputFormat": "parquet",
          "catalogType": "iceberg-rest",
          "catalog.iceberg-rest.serviceType": "glue",
          "catalog.iceberg-rest.table.namespace": "st-database",
          "catalog.iceberg-rest.table.tableName": "glue_iceberg_table_wiki",
          "catalog.iceberg-rest.restUri": "https://glue.<REGION>.amazonaws.com",
          "catalog.iceberg-rest.warehouse": "<YOUR_AWS_ACCOUNT_ID>",
          "catalog.iceberg-rest.auth.rest.authType": "aws-sigv4",
          "catalog.iceberg-rest.auth.rest.accessKeyId": "<YOUR_ACCESS_KEY_ID>",
          "catalog.iceberg-rest.auth.rest.secretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
          "catalog.iceberg-rest.auth.rest.region": "<REGION>",
          "catalog.iceberg-rest.auth.rest.service": "glue",
          "catalog.iceberg-rest.auth.storage.authType": "aws-sigv4",
          "catalog.iceberg-rest.auth.storage.accessKeyId": "<YOUR_ACCESS_KEY_ID>",
          "catalog.iceberg-rest.auth.storage.secretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
          "catalog.iceberg-rest.auth.storage.region": "<REGION>"
        }
      }
    },
    "tierConfigs": [],
    "isDimTable": false
  }
}

Note: The ExternalTableSyncTask config in create uses the hierarchical catalog.iceberg-rest.auth.* key format. Catalog-backed ingestion is selected when the task config includes a top-level catalogType.

3. Glue Native

What it is: Connects to AWS Glue using the native Glue SDK (as opposed to the Iceberg REST protocol). Simpler to configure — no restUri or serviceType needed. The Glue database to use is specified by the database field. catalogType: "glue-catalog"

3.1 Validate

POST /externalTable/catalog/validate

{
  "catalogType": "glue-catalog",
  "catalogConfig": {
    "region": "<REGION>",
    "database": "st-database",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

3.2 List Namespaces

POST /externalTable/catalog/namespaces

{
  "catalogType": "glue-catalog",
  "catalogConfig": {
    "region": "<REGION>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

3.3 List Tables

POST /externalTable/catalog/tables/list

{
  "catalogType": "glue-catalog",
  "catalogConfig": {
    "region": "<REGION>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "namespace": "st-database"
}

3.4 Create Pinot Table

{
  "catalogType": "glue-catalog",
  "catalogConfig": {
    "namespace": "st-database",
    "tableName": "glue_iceberg_table_wiki",
    "region": "<REGION>",
    "database": "st-database",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "glue",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "schemaOptions": {
    "includePartitionColumns": true,
    "timestampColumn": null,
    "timeUnit": "MILLISECONDS",
    "schemaName": "glue_native_table"
  },
  "tableConfig": {
    "tableName": "glue_native_table_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeColumnName": null,
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "365",
      "replication": "1",
      "segmentPushType": "APPEND"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "task": {
      "taskTypeConfigsMap": {
        "ExternalTableSyncTask": {
          "schedule": "0 */30 * * * ?",
          "executor": "controller",
          "inputFormat": "parquet",
          "catalogType": "glue",
          "catalog.glue.table.namespace": "st-database",
          "catalog.glue.table.tableName": "glue_iceberg_table_wiki",
          "catalog.glue.region": "<REGION>",
          "catalog.glue.database": "st-database",
          "catalog.glue.awsAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
          "catalog.glue.awsSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>"
        }
      }
    },
    "tierConfigs": [],
    "isDimTable": false
  }
}

4. Nessie REST

What it is: Connects to a Project Nessie server using the Iceberg REST protocol. Nessie itself requires no authentication in this configuration (restAuthType: "none"); only S3 credentials are needed to read the underlying data files. catalogType: "iceberg-rest-s3" with "serviceType": "nessie"

4.1 Validate

POST /externalTable/catalog/validate

{
  "catalogType": "iceberg-rest-s3",
  "catalogConfig": {
    "restUri": "http://localhost:19120/iceberg",
    "serviceType": "nessie",
    "restAuthType": "none",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

4.2 List Namespaces

POST /externalTable/catalog/namespaces

{
  "catalogType": "iceberg-rest-s3",
  "catalogConfig": {
    "restUri": "http://localhost:19120/iceberg",
    "serviceType": "nessie",
    "restAuthType": "none",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

4.3 List Tables

POST /externalTable/catalog/tables/list

{
  "catalogType": "iceberg-rest-s3",
  "catalogConfig": {
    "restUri": "http://localhost:19120/iceberg",
    "serviceType": "nessie",
    "restAuthType": "none",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "namespace": "default"
}

4.4 Create Pinot Table

{
  "catalogType": "iceberg-rest-s3",
  "catalogConfig": {
    "namespace": "default",
    "tableName": "test_company_data",
    "restUri": "http://localhost:19120/iceberg",
    "serviceType": "nessie",
    "restAuthType": "none",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "schemaOptions": {
    "includePartitionColumns": true,
    "timestampColumn": null,
    "timeUnit": "MILLISECONDS",
    "schemaName": "nessie_rest_table"
  },
  "tableConfig": {
    "tableName": "nessie_rest_table_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeColumnName": null,
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "365",
      "replication": "1",
      "segmentPushType": "APPEND"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "task": {
      "taskTypeConfigsMap": {
        "ExternalTableSyncTask": {
          "schedule": "0 */30 * * * ?",
          "executor": "controller",
          "inputFormat": "parquet",
          "catalogType": "iceberg-rest",
          "catalog.iceberg-rest.serviceType": "nessie",
          "catalog.iceberg-rest.table.namespace": "default",
          "catalog.iceberg-rest.table.tableName": "test_company_data",
          "catalog.iceberg-rest.restUri": "http://localhost:19120/iceberg",
          "catalog.iceberg-rest.auth.rest.authType": "none",
          "catalog.iceberg-rest.auth.storage.authType": "aws-sigv4",
          "catalog.iceberg-rest.auth.storage.accessKeyId": "<YOUR_ACCESS_KEY_ID>",
          "catalog.iceberg-rest.auth.storage.secretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
          "catalog.iceberg-rest.auth.storage.region": "<REGION>"
        }
      }
    },
    "tierConfigs": [],
    "isDimTable": false
  }
}

5. S3 Tables REST

What it is: Connects to AWS S3 Tables, a managed Iceberg-compatible table storage service. Uses the Iceberg REST protocol. Requires an additional tableBucketArn field that identifies the S3 Tables bucket. catalogType: "iceberg-rest" with "serviceType": "s3Tables"

5.1 Validate

POST /externalTable/catalog/validate

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "restUri": "https://s3tables.<REGION>.amazonaws.com",
    "serviceType": "s3Tables",
    "tableBucketArn": "arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "s3tables",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

5.2 List Namespaces

POST /externalTable/catalog/namespaces

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "restUri": "https://s3tables.<REGION>.amazonaws.com",
    "serviceType": "s3Tables",
    "tableBucketArn": "arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "s3tables",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  }
}

5.3 List Tables

POST /externalTable/catalog/tables/list

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "restUri": "https://s3tables.<REGION>.amazonaws.com",
    "serviceType": "s3Tables",
    "tableBucketArn": "arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "s3tables",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "namespace": "s3_namespace_wiki"
}

5.4 Create Pinot Table

{
  "catalogType": "iceberg-rest",
  "catalogConfig": {
    "namespace": "s3_namespace_wiki",
    "tableName": "s3_table_wiki",
    "restUri": "https://s3tables.<REGION>.amazonaws.com",
    "serviceType": "s3Tables",
    "tableBucketArn": "arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>",
    "restAuthType": "aws-sigv4",
    "restAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "restSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "restRegion": "<REGION>",
    "restService": "s3tables",
    "storageAuthType": "aws-sigv4",
    "storageAccessKeyId": "<YOUR_ACCESS_KEY_ID>",
    "storageSecretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
    "storageRegion": "<REGION>"
  },
  "schemaOptions": {
    "includePartitionColumns": true,
    "timestampColumn": null,
    "timeUnit": "MILLISECONDS",
    "schemaName": "s3tables_rest_table"
  },
  "tableConfig": {
    "tableName": "s3tables_rest_table_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeColumnName": null,
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "365",
      "replication": "1",
      "segmentPushType": "APPEND"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "rangeIndexVersion": 2,
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "task": {
      "taskTypeConfigsMap": {
        "ExternalTableSyncTask": {
          "schedule": "0 */30 * * * ?",
          "executor": "controller",
          "inputFormat": "parquet",
          "catalogType": "iceberg-rest",
          "catalog.iceberg-rest.serviceType": "s3Tables",
          "catalog.iceberg-rest.table.namespace": "s3_namespace_wiki",
          "catalog.iceberg-rest.table.tableName": "s3_table_wiki",
          "catalog.iceberg-rest.restUri": "https://s3tables.<REGION>.amazonaws.com",
          "catalog.iceberg-rest.tableBucketArn": "arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>",
          "catalog.iceberg-rest.auth.rest.authType": "aws-sigv4",
          "catalog.iceberg-rest.auth.rest.accessKeyId": "<YOUR_ACCESS_KEY_ID>",
          "catalog.iceberg-rest.auth.rest.secretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
          "catalog.iceberg-rest.auth.rest.region": "<REGION>",
          "catalog.iceberg-rest.auth.rest.service": "s3tables",
          "catalog.iceberg-rest.auth.storage.authType": "aws-sigv4",
          "catalog.iceberg-rest.auth.storage.accessKeyId": "<YOUR_ACCESS_KEY_ID>",
          "catalog.iceberg-rest.auth.storage.secretAccessKey": "<YOUR_SECRET_ACCESS_KEY>",
          "catalog.iceberg-rest.auth.storage.region": "<REGION>"
        }
      }
    },
    "tierConfigs": [],
    "isDimTable": false
  }
}

Request Body Field Reference

`catalogConfig` — Common Fields

Field	Applies to	Description
`namespace`	All (steps 2–3)	Namespace (Glue database, Nessie namespace, etc.) containing the target table. Required for List Tables and Create.
`tableName`	Create	Name of the Iceberg table. Required for Create.
`restUri`	iceberg-rest, iceberg-rest-s3	URL of the Iceberg REST catalog endpoint.
`serviceType`	iceberg-rest, iceberg-rest-s3	Identifies the backing service: `"glue"`, `"nessie"`, or `"s3Tables"`.
`warehouse`	iceberg-rest (glue)	AWS account ID — used as the Glue warehouse identifier.
`tableBucketArn`	iceberg-rest (s3Tables)	Full ARN of the S3 Tables bucket.
`restAuthType`	iceberg-rest, iceberg-rest-s3, glue-catalog	Auth type for the catalog REST API: `"aws-sigv4"` or `"none"`.
`restAccessKeyId` / `restSecretAccessKey`	iceberg-rest, glue-catalog	AWS credentials for catalog API access.
`restRegion`	iceberg-rest, glue-catalog	AWS region for the catalog endpoint.
`restService`	iceberg-rest, glue-catalog	AWS service name for SigV4 signing: `"glue"` or `"s3tables"`.
`storageAuthType`	iceberg-rest, iceberg-rest-s3, glue-catalog	Auth type for S3 data access: always `"aws-sigv4"`.
`storageAccessKeyId` / `storageSecretAccessKey`	iceberg-rest, iceberg-rest-s3, glue-catalog	AWS credentials for reading Parquet data from S3.
`storageRegion`	iceberg-rest, iceberg-rest-s3, glue-catalog	AWS region for S3 data access.
`region` / `database`	glue-catalog	Region and default database for native Glue SDK access.
`bucketName` / `prefix`	s3-catalog	S3 bucket name and key prefix pointing to the Parquet files. The prefix is passed to the S3 `ListObjectsV2` API and will match all objects whose keys start with this string.
`accessKey` / `secretKey`	s3-catalog	AWS credentials (note: different field names from other providers).

`tableConfig` — Key Fields

Field	Description
`tableName`	Pinot table name. Convention: `<schemaName>_OFFLINE`.
`tableType`	Always `"OFFLINE"` for Iceberg ingestion.
`segmentsConfig.timeColumnName`	Time column for Pinot segments. Can be `null` if no time dimension.
`segmentsConfig.retentionTimeValue` / `retentionTimeUnit`	How long Pinot retains segments.
`segmentsConfig.segmentPushType`	Always `"APPEND"` — new Iceberg snapshots are appended as new segments.
`tableIndexConfig.nullHandlingEnabled`	Set to `true` to handle nullable columns from Iceberg schemas.
`task.taskTypeConfigsMap.ExternalTableSyncTask.schedule`	Cron expression for ingestion frequency. `"0 /30 * * ?"` runs every 30 minutes.
`task.taskTypeConfigsMap.ExternalTableSyncTask.inputFormat`	Always `"parquet"` — catalog data files use Parquet.

Frequently Asked Questions

Why isn’t my table ingesting data after creation? The ExternalTableSyncTask runs on a cron schedule (default every 30 minutes) and does not fire automatically on table creation. You must manually trigger the first run:

curl -X POST \
  "http://localhost:9000/tasks/schedule" \
  -H "Content-Type: application/json" \
  -d '{"taskType" : "ExternalTableSyncTask", "tableName":"<TABLE_NAME>"}'

See Observability → Trigger Ingestion Task for details.

Which catalog type should I use?

My setup	Use
Raw Parquet files directly on S3 (no catalog service)	`s3-catalog` (S3 Data Lake)
AWS Glue via the Iceberg REST protocol	`iceberg-rest` with `serviceType: "glue"` (Glue REST)
AWS Glue via the native Glue SDK	`glue-catalog` (Glue Native)
Project Nessie server	`iceberg-rest-s3` with `serviceType: "nessie"` (Nessie REST)
AWS S3 Tables	`iceberg-rest` with `serviceType: "s3Tables"` (S3 Tables REST)

What credentials do I need? For all AWS-backed catalog types (iceberg-rest, glue-catalog, iceberg-rest-s3), you need two sets of AWS credentials:

Catalog credentials (restAccessKeyId / restSecretAccessKey) — to authenticate against the catalog API (Glue, S3 Tables).
Storage credentials (storageAccessKeyId / storageSecretAccessKey) — to read Parquet data files from S3.

For the S3 Data Lake provider (s3-catalog), only one set is needed, using the field names accessKey / secretKey.

What file format does Iceberg ingestion support? Only Parquet. Set "inputFormat": "parquet" in the ExternalTableSyncTask config. Iceberg-managed catalog tables use Parquet data files by default.

Can I set a custom ingestion schedule? Yes. The schedule field in ExternalTableSyncTask accepts a standard cron expression. The default "0 */5 * * * ?" runs every 5 minutes. To run every hour, use "0 0 * * * ?". The schedule applies to all subsequent automatic runs; the first run must always be triggered manually.

How do I pause ingestion without deleting the table? Set "enabled": "false" in the ExternalTableSyncTask config and update the table. This stops the scheduler from creating new ingestion runs while preserving all existing segments and the last checkpoint. Re-enable by setting "enabled": "true". See Pausing Ingestion for the full example.

What happens if timeColumnName is null? Pinot creates the table without a time dimension. Segments are still ingested and queryable, but time-based retention and time-partition pruning are disabled. Set timeColumnName to a timestamp column in your Iceberg schema if you need those features.

My Validate call succeeds but List Namespaces returns nothing — why? The validate endpoint only confirms connectivity and credential validity. An empty namespace list typically means the credentials have access to the catalog service but the account or region contains no databases, or the warehouse / database field points to the wrong scope. Double-check that the region, warehouse, and database fields match your actual Glue or S3 Tables configuration.

Can I use IAM roles instead of access keys? Cross account IAM role based access is not supported yet.

How do I change the ingestion schedule after table creation? Update the ExternalTableSyncTask.schedule field in the table config.

Where can I monitor ingestion health? See the Observability page for the Watcher Status, Checkpoint, and File Count APIs.

​How It Works

​API Endpoints Quick Reference

​Recommended Workflow

​Pausing Ingestion

​Catalog Providers

​1. S3 Data Lake

​1.1 Validate

​1.2 Create Pinot Table

​2. Glue REST

​2.1 Validate

​2.2 List Namespaces

​2.3 List Tables

​2.4 Create Pinot Table

​3. Glue Native

​3.1 Validate

​3.2 List Namespaces

​3.3 List Tables

​3.4 Create Pinot Table

​4. Nessie REST

​4.1 Validate

​4.2 List Namespaces

​4.3 List Tables

​4.4 Create Pinot Table

​5. S3 Tables REST

​5.1 Validate

​5.2 List Namespaces

​5.3 List Tables

​5.4 Create Pinot Table

​Request Body Field Reference

​catalogConfig — Common Fields

​tableConfig — Key Fields

​Frequently Asked Questions

How It Works

API Endpoints Quick Reference

Recommended Workflow

Pausing Ingestion

Catalog Providers

1. S3 Data Lake

1.1 Validate

1.2 Create Pinot Table

2. Glue REST

2.1 Validate

2.2 List Namespaces

2.3 List Tables

2.4 Create Pinot Table

3. Glue Native

3.1 Validate

3.2 List Namespaces

3.3 List Tables

3.4 Create Pinot Table

4. Nessie REST

4.1 Validate

4.2 List Namespaces

4.3 List Tables

4.4 Create Pinot Table

5. S3 Tables REST

5.1 Validate

5.2 List Namespaces

5.3 List Tables

5.4 Create Pinot Table

Request Body Field Reference

`catalogConfig` — Common Fields

`tableConfig` — Key Fields

Frequently Asked Questions