> ## Documentation Index
> Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# S3: Onboarding via Data Portal

<Warning>
  This feature is available starting in **StarTree release 0.14.0**. It must be enabled on demand — contact your StarTree representative to have it activated for your environment.
</Warning>

This guide walks through connecting an S3 Data Lake to StarTree using the **Data Portal UI**. No API calls or JSON configuration required — Data Portal guides you through catalog connection, table selection, and onboarding setup through a point-and-click interface.

> Looking for the API-based approach? See [S3: Onboarding via API](./onboarding-api).

***

## Overview

**S3 Data Lake** connects to raw Parquet files stored directly in an S3 bucket — no catalog service required. StarTree scans the specified S3 prefix, discovers Parquet files, and makes them queryable through Pinot.

***

## Prerequisites

Before starting, ensure you have:

* **StarTree 0.14.0 or later** with the external table Beta feature enabled for your environment.
* AWS credentials (access key + secret key) with `s3:GetObject` and `s3:ListBucket` permissions on the source bucket and prefix.
* The bucket name, key prefix, and AWS region for your Parquet data.

***

## Step 1: Open the External Tables

1. Log in to **Data Portal**.
2. In the left navigation, go to **Tables**.
3. Click **+ Connect External Table**.

The wizard opens with a connection configuration screen.

***

## Step 2: Select S3 Data Lake

Choose **S3 Data Lake** as the catalog type.

Fill in the connection details:

| Field          | Description                                                                                                                                                                                           |
| -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **S3 Bucket**  | Name of the S3 bucket containing your Parquet files.                                                                                                                                                  |
| **Prefix**     | Key prefix (folder path) pointing to the Parquet data, e.g. `path/to/parquet/data/`. The prefix is passed to the S3 `ListObjectsV2` API and will match all objects whose keys start with this string. |
| **Region**     | AWS region where the bucket is located.                                                                                                                                                               |
| **Access Key** | AWS access key ID.                                                                                                                                                                                    |
| **Secret Key** | AWS secret access key.                                                                                                                                                                                |

Click **Validate Connection**. Data Portal calls the catalog's validate endpoint and confirms credentials and connectivity before proceeding.

***

## Step 3: Browse and Select a Table

Once the connection is validated:

1. Data Portal lists the available **S3 prefixes** (directories) under your configured prefix.
2. Select a prefix to view the Parquet files it contains.
3. Click the **table** (prefix) you want to onboard.

StarTree samples the Parquet files to derive a schema automatically.

***

## Step 4: Review the Schema

The auto-generated Pinot schema is displayed for review. You can:

* **Set a time column** — select the column to use as the Pinot time dimension (optional; leave blank for no time partitioning).
* **Rename the schema** — provide a custom schema name, or accept the default derived from the prefix.

Click **Next** when the schema looks correct.

***

## Step 5: Configure the Table

Review and adjust the table configuration:

| Setting                 | Default         | Notes                                                                       |
| ----------------------- | --------------- | --------------------------------------------------------------------------- |
| **Onboarding schedule** | Every 5 minutes | Cron expression controlling how often new Parquet files are onboarded.      |
| **Null handling**       | Enabled         | Required for schemas that include nullable columns.                         |
| **Segment push type**   | Append          | Each new batch of Parquet files is ingested as a new set of Pinot segments. |

Click **Create Table** to register the schema and table with Pinot. Data Portal automatically triggers the first onboarding run immediately after creation — no manual step required.

***

## Step 6: Monitor Onboarding

Once the table is created, onboarding starts automatically. The table detail view shows the status in real time:

* **Running** — the task is actively reading Parquet files and building Pinot segments.
* **Completed** — onboarding finished successfully.
* **Failed** — onboarding encountered an error. The error message and the number of files discovered vs. segments generated are surfaced to help diagnose the issue.

For deeper observability — watcher status, checkpoint values, and per-file counts — see the [Observability](../observability) page.

***

## Pausing Onboarding

To pause scheduled onboarding from Data Portal:

1. Open the table in the **Tables** view.
2. Click **Pause Sync**. This sets `"enabled": "false"` on the ExternalTableSyncTask. Any run currently in progress completes normally. Existing segments and the last checkpoint are preserved — when you re-enable, onboarding resumes from where it left off.

***

## Frequently Asked Questions

**The Validate step fails — what should I check?**

* Confirm the access key has `s3:GetObject` and `s3:ListBucket` permissions on the bucket and prefix.
* Verify the region matches where your S3 bucket is located.
* Check that the prefix ends with `/` if it's a folder path.

***

**Can I onboard multiple prefixes from the same bucket?**

Yes. Each prefix is registered as an independent Pinot table with its own onboarding schedule. Start the wizard again and use the same bucket with a different prefix.

***

**The table was created but onboarding hasn't started — what should I check?**

Data Portal triggers the first onboarding run automatically after table creation. If onboarding hasn't started, check the table's detail page for an error status and review the error message. You can also trigger a run manually via the [trigger API](../observability#1-trigger-ingestion-task).
