Onboarding via Data Portal

This feature is available starting in StarTree release 0.14.0. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.

This guide walks through connecting an external data source to StarTree using the Data Portal UI. No API calls or JSON configuration required — Data Portal guides you through catalog connection, table selection, and ingestion setup through a point-and-click interface.

Looking for the API-based approach? See the API Onboarding Guide.

Supported Sources

Data Portal currently supports the following external catalog types in Beta:

Source	Description
S3 Data Lake	Raw Parquet files stored directly in an S3 bucket — no catalog service required.
AWS Glue (Iceberg REST)	Iceberg tables managed by AWS Glue, accessed via the Iceberg REST protocol with SigV4 authentication.
AWS S3 Tables (Iceberg REST)	Iceberg-compatible tables in AWS S3 Tables buckets, accessed via the Iceberg REST protocol.

Prerequisites

Before starting, ensure you have:

StarTree 0.14.0 or later with the external table Beta feature enabled for your environment.
AWS credentials (access key + secret key) with read permissions on both the catalog service and the underlying S3 data.
For S3 Tables: the full ARN of your S3 Tables bucket.
For Glue: your AWS account ID (used as the Glue warehouse identifier) and the target Glue database name.

Step 1: Open the External Tables

Log in to Data Portal.
In the left navigation, go to Tables.
Click + Connect External Table.

The wizard opens with a connection configuration screen.

Step 2: Select a Catalog Provider

Choose the catalog type that matches your data source:

S3 Data Lake — for raw Parquet files on S3.
Iceberg REST — for Iceberg tables managed by AWS Glue and S3 Tables.

Fill in the credentials and connection details for your chosen catalog type.

S3 Data Lake
Glue REST
S3 Tables REST

Field	Description
S3 Bucket	Name of the S3 bucket containing your Parquet files.
Prefix	Key prefix (folder path) pointing to the Parquet data, e.g. `path/to/parquet/data/`.
Region	AWS region where the bucket is located.
Access Key	AWS access key ID.
Secret Key	AWS secret access key.

Details

Field	Description
Catalog Connection Name	A unique name to identify this connection in Data Portal.

Metastore — credentials for authenticating with the AWS Glue catalog API.

Field	Description
REST Service	Set to `Glue`.
Warehouse	Your AWS account ID — used as the Glue warehouse identifier.
Access Key	AWS access key ID for Glue catalog API access.
Secret Key	AWS secret access key for Glue catalog API access.
Region	AWS region where your Glue catalog is located, e.g. `us-east-1`.

Storage — credentials for reading the underlying Parquet data files from S3.

Field	Description
Access Key	AWS access key ID for S3 data access. Can be the same as the Metastore key if the same principal has both permissions.
Secret Key	AWS secret access key for S3 data access.
Region	AWS region where the S3 data files are stored.

Details

Field	Description
Catalog Connection Name	A unique name to identify this connection in Data Portal.

Metastore — credentials for authenticating with the S3 Tables REST catalog API.

Field	Description
REST Service	Set to `S3Tables`.
Table Bucket ARN	Full ARN of the S3 Tables bucket, e.g. `arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>`.
Access Key	AWS access key ID for S3 Tables catalog API access.
Secret Key	AWS secret access key for S3 Tables catalog API access.
Region	AWS region where the S3 Tables bucket is located.

Storage — credentials for reading the underlying Parquet data files.

Field	Description
Access Key	AWS access key ID for S3 data access. Can be the same as the Metastore key if the same principal has both permissions.
Secret Key	AWS secret access key for S3 data access.
Region	AWS region where the S3 data files are stored.

Click Validate Connection. Data Portal calls the catalog’s validate endpoint and confirms credentials and connectivity before proceeding.

Step 4: Browse and Select a Table

Once the connection is validated:

Data Portal lists the available namespaces (Glue databases, S3 namespaces, or S3 prefixes).
Select a namespace to expand its tables.
Click the table you want to onboard.

Data Portal reads the Iceberg schema and derives a Pinot schema automatically.

Step 5: Review the Schema

The auto-generated Pinot schema is displayed for review. You can:

Set a time column — select the column to use as the Pinot time dimension (optional; leave blank for no time partitioning).
Include or exclude partition columns — toggle whether Iceberg partition columns are added as Pinot dimension columns.
Rename the schema — provide a custom schema name, or accept the default derived from the table name.

Click Next when the schema looks correct.

Step 6: Configure the Table

Review and adjust the table configuration:

Setting	Default	Notes
Ingestion schedule	Every 5 minutes	Cron expression controlling how often new Iceberg snapshots are ingested.
Null handling	Enabled	Required for Iceberg schemas that include nullable columns.
Segment push type	Append	Each new Iceberg snapshot is ingested as a new set of Pinot segments.

Click Create Table to register the schema and table with Pinot. Data Portal automatically triggers the first ingestion run immediately after creation — no manual step required.

Step 7: Monitor Ingestion

Once the table is created, ingestion starts automatically. The table detail view shows the status in real time:

Running — the task is actively reading Iceberg snapshots and building Pinot segments.
Completed — ingestion finished successfully. The last ingested snapshot ID is shown.
Failed — ingestion encountered an error. The error message and the number of files discovered vs. segments generated are surfaced to help diagnose the issue.

For deeper observability — watcher status, checkpoint values, and per-snapshot file counts — see the Observability page.

Pausing Ingestion

To pause scheduled ingestion from Data Portal:

Open the table in the Tables view.
Click Pause Ingestion. This sets "enabled": "false" on the IcebergIngestionTask. Any run currently in progress completes normally. Existing segments and the last checkpoint are preserved — when you re-enable, ingestion resumes from where it left off.

Frequently Asked Questions

The Validate step fails — what should I check?

Confirm the access key has glue:GetTable, glue:GetDatabase, and s3:GetObject permissions (or equivalent for S3 Tables).
Verify the region matches where your Glue database or S3 bucket lives.
For Glue REST, ensure the warehouse value is your numeric AWS account ID, not an account alias.

Can I onboard multiple tables from the same catalog? Yes. After creating the first table, start the wizard again and reuse the same connection credentials. Each table is registered as an independent Pinot table with its own ingestion schedule.

The table was created but ingestion hasn’t started — what should I check? Data Portal triggers the first ingestion run automatically after table creation. If ingestion hasn’t started, check the table’s detail page for an error status and review the error message. You can also trigger a run manually via the trigger API.

Get Started

Ingestion

Query External Tables

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Onboarding via Data Portal

Supported Sources

Prerequisites

Step 1: Open the External Tables

Step 2: Select a Catalog Provider

Step 4: Browse and Select a Table

Step 5: Review the Schema

Step 6: Configure the Table

Step 7: Monitor Ingestion

Pausing Ingestion

Frequently Asked Questions

Get Started

Ingestion

Query External Tables

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

​Supported Sources

​Prerequisites

​Step 1: Open the External Tables

​Step 2: Select a Catalog Provider

​Step 4: Browse and Select a Table

​Step 5: Review the Schema

​Step 6: Configure the Table

​Step 7: Monitor Ingestion

​Pausing Ingestion

​Frequently Asked Questions

Supported Sources

Prerequisites

Step 1: Open the External Tables

Step 2: Select a Catalog Provider

Step 4: Browse and Select a Table

Step 5: Review the Schema

Step 6: Configure the Table

Step 7: Monitor Ingestion

Pausing Ingestion

Frequently Asked Questions