> ## Documentation Index
> Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Iceberg: Onboarding via Data Portal

<Warning>
  This feature is available starting in **StarTree release 0.14.0**. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.
</Warning>

This guide walks through connecting an Iceberg catalog to StarTree using the **Data Portal UI**. No API calls or JSON configuration required — Data Portal guides you through catalog connection, table selection, and onboarding setup through a point-and-click interface.

> Looking for the API-based approach? See [Iceberg: Onboarding via API](./onboarding-api).

***

## Supported Sources

Data Portal supports the following Iceberg catalog types:

| Source                           | Description                                                                                           |
| -------------------------------- | ----------------------------------------------------------------------------------------------------- |
| **AWS Glue (Iceberg REST)**      | Iceberg tables managed by AWS Glue, accessed via the Iceberg REST protocol with SigV4 authentication. |
| **AWS S3 Tables (Iceberg REST)** | Iceberg-compatible tables in AWS S3 Tables buckets, accessed via the Iceberg REST protocol.           |

***

## Prerequisites

Before starting, ensure you have:

* **StarTree 0.14.0 or later** with the external table Beta feature enabled for your environment.
* AWS credentials (access key + secret key) with read permissions on both the catalog service and the underlying S3 data.
* For S3 Tables: the full ARN of your S3 Tables bucket.
* For Glue: your AWS account ID (used as the Glue warehouse identifier) and the target Glue database name.

***

## Step 1: Open the External Tables

1. Log in to **Data Portal**.
2. In the left navigation, go to **Tables**.
3. Click **+ Connect External Table**.

The wizard opens with a connection configuration screen.

***

## Step 2: Select a Catalog Provider

Choose **Iceberg REST** as the catalog type, then select the specific service:

* **Glue REST** — for Iceberg tables managed by AWS Glue.
* **S3 Tables REST** — for Iceberg tables in AWS S3 Tables buckets.

Fill in the credentials and connection details for your chosen provider.

<Tabs>
  <Tab title="Glue REST">
    **Details**

    | Field                       | Description                                               |
    | --------------------------- | --------------------------------------------------------- |
    | **Catalog Connection Name** | A unique name to identify this connection in Data Portal. |

    **Metastore** — credentials for authenticating with the AWS Glue catalog API.

    | Field            | Description                                                      |
    | ---------------- | ---------------------------------------------------------------- |
    | **REST Service** | Set to `Glue`.                                                   |
    | **Warehouse**    | Your AWS account ID — used as the Glue warehouse identifier.     |
    | **Access Key**   | AWS access key ID for Glue catalog API access.                   |
    | **Secret Key**   | AWS secret access key for Glue catalog API access.               |
    | **Region**       | AWS region where your Glue catalog is located, e.g. `us-east-1`. |

    **Storage** — credentials for reading the underlying Parquet data files from S3.

    | Field          | Description                                                                                                            |
    | -------------- | ---------------------------------------------------------------------------------------------------------------------- |
    | **Access Key** | AWS access key ID for S3 data access. Can be the same as the Metastore key if the same principal has both permissions. |
    | **Secret Key** | AWS secret access key for S3 data access.                                                                              |
    | **Region**     | AWS region where the S3 data files are stored.                                                                         |
  </Tab>

  <Tab title="S3 Tables REST">
    **Details**

    | Field                       | Description                                               |
    | --------------------------- | --------------------------------------------------------- |
    | **Catalog Connection Name** | A unique name to identify this connection in Data Portal. |

    **Metastore** — credentials for authenticating with the S3 Tables REST catalog API.

    | Field                | Description                                                                                           |
    | -------------------- | ----------------------------------------------------------------------------------------------------- |
    | **REST Service**     | Set to `S3Tables`.                                                                                    |
    | **Table Bucket ARN** | Full ARN of the S3 Tables bucket, e.g. `arn:aws:s3tables:<region>:<account-id>:bucket/<bucket-name>`. |
    | **Access Key**       | AWS access key ID for S3 Tables catalog API access.                                                   |
    | **Secret Key**       | AWS secret access key for S3 Tables catalog API access.                                               |
    | **Region**           | AWS region where the S3 Tables bucket is located.                                                     |

    **Storage** — credentials for reading the underlying Parquet data files.

    | Field          | Description                                                                                                            |
    | -------------- | ---------------------------------------------------------------------------------------------------------------------- |
    | **Access Key** | AWS access key ID for S3 data access. Can be the same as the Metastore key if the same principal has both permissions. |
    | **Secret Key** | AWS secret access key for S3 data access.                                                                              |
    | **Region**     | AWS region where the S3 data files are stored.                                                                         |
  </Tab>
</Tabs>

Click **Validate Connection**. Data Portal calls the catalog's validate endpoint and confirms credentials and connectivity before proceeding.

***

## Step 4: Browse and Select a Table

Once the connection is validated:

1. Data Portal lists the available **namespaces** (Glue databases or S3 namespaces).
2. Select a namespace to expand its tables.
3. Click the **table** you want to onboard.

Data Portal reads the Iceberg schema and derives a Pinot schema automatically.

***

## Step 5: Review the Schema

The auto-generated Pinot schema is displayed for review. You can:

* **Set a time column** — select the column to use as the Pinot time dimension (optional; leave blank for no time partitioning).
* **Include or exclude partition columns** — toggle whether Iceberg partition columns are added as Pinot dimension columns.
* **Rename the schema** — provide a custom schema name, or accept the default derived from the table name.

Click **Next** when the schema looks correct.

***

## Step 6: Configure the Table

Review and adjust the table configuration:

| Setting                 | Default         | Notes                                                                      |
| ----------------------- | --------------- | -------------------------------------------------------------------------- |
| **Onboarding schedule** | Every 5 minutes | Cron expression controlling how often new Iceberg snapshots are onboarded. |
| **Null handling**       | Enabled         | Required for Iceberg schemas that include nullable columns.                |
| **Segment push type**   | Append          | Each new Iceberg snapshot is ingested as a new set of Pinot segments.      |

Click **Create Table** to register the schema and table with Pinot. Data Portal automatically triggers the first onboarding run immediately after creation — no manual step required.

***

## Step 7: Monitor Onboarding

Once the table is created, onboarding starts automatically. The table detail view shows the status in real time:

* **Running** — the task is actively reading Iceberg snapshots and building Pinot segments.
* **Completed** — onboarding finished successfully. The last ingested snapshot ID is shown.
* **Failed** — onboarding encountered an error. The error message and the number of files discovered vs. segments generated are surfaced to help diagnose the issue.

For deeper observability — watcher status, checkpoint values, and per-snapshot file counts — see the [Observability](../observability) page.

***

## Pausing Onboarding

To pause scheduled onboarding from Data Portal:

1. Open the table in the **Tables** view.
2. Click **Pause Sync**. This sets `"enabled": "false"` on the ExternalTableSyncTask. Any run currently in progress completes normally. Existing segments and the last checkpoint are preserved — when you re-enable, onboarding resumes from where it left off.

***

## Frequently Asked Questions

**The Validate step fails — what should I check?**

* Confirm the access key has `glue:GetTable`, `glue:GetDatabase`, and `s3:GetObject` permissions (or equivalent for S3 Tables).
* Verify the region matches where your Glue database or S3 bucket lives.
* For Glue REST, ensure the warehouse value is your numeric AWS account ID, not an account alias.

***

**Can I onboard multiple tables from the same catalog?**

Yes. After creating the first table, start the wizard again and reuse the same connection credentials. Each table is registered as an independent Pinot table with its own onboarding schedule.

***

**The table was created but onboarding hasn't started — what should I check?**

Data Portal triggers the first onboarding run automatically after table creation. If onboarding hasn't started, check the table's detail page for an error status and review the error message. You can also trigger a run manually via the [trigger API](../observability#1-trigger-ingestion-task).
