Skip to main content
This feature requires StarTree release 0.14.0 or later, and must be enabled on demand — contact StarTree support to activate it.
An External Table is a Pinot table whose data stays in Parquet files in your object store — S3 Data Lake, AWS Glue, or AWS S3 Tables — instead of being copied into Pinot’s own segment format. Pinot reads the remote Parquet at query time and exposes it through standard SQL. There is no ETL pipeline, no data duplication, and onboarding takes minutes instead of hours. A watcher on the controller detects source-side changes on a schedule and builds segment index files (bloom filters, inverted indexes, range indexes, etc.) alongside the Parquet. Queries use server-side caches — a Parquet data cache, an index cache, and a footer cache — so repeat reads avoid paying S3 round-trip latency. With the right indexes configured, query times drop from minutes to milliseconds on large datasets.

How it works

At query time, the server checks its local caches first. On a miss, it fetches the required Parquet column pages or index byte ranges from object storage and stores them for subsequent queries. Index files (built by the watcher at sync time) live in tiered storage and are also cached locally, so filters and aggregations avoid full column scans.

Supported sources

SourceProtocolcatalogType
S3 Data LakeRaw Parquet files under an S3 prefixs3
AWS GlueIceberg RESTiceberg-rest (serviceType=glue)
AWS S3 TablesIceberg RESTiceberg-rest (serviceType=s3Tables)
catalogType=iceberg-rest works with any Iceberg REST–compliant catalog. Data files must be Parquet.

Where to start

Choose your path based on what you’re trying to do:
  • New user, prefer point-and-clickOnboarding via Data Portal — wizard-based setup, no API calls required
  • New user, prefer API / automation / IaCOnboarding via API — 4-step REST flow with bash examples and a copy-paste quickstart script
  • Table created, monitoring sync progressObservability — sync status, checkpoint watermark, and source file count APIs
  • Queries are slowIndexes to add the right indexes, then Best Practices & Configs for caching and tuning
  • Something is brokenTroubleshooting for symptom-based fixes, or FAQ for common questions

Page map

PageWhat it covers
Onboarding via Data PortalPoint-and-click wizard: connect, browse, configure, monitor
Onboarding via APIREST API 4-step flow with bash examples and a self-contained quickstart script
ObservabilitySync run status, ingestion checkpoint, source file count, and manual trigger APIs
Data Type MappingParquet → Pinot and Iceberg → Pinot type mapping tables, plus time column detection
IndexesSupported indexes, why columns are RAW, and per-index config examples
Data and Index CachingThree caches (data, index, footer), eviction, restart behavior, and how to clear them
Best Practices & ConfigsFull config reference: sync task, tier backend, server/cluster, query options, and OOM protection
FAQCommon questions by category: general, onboarding, schema, indexes, performance, operations
TroubleshootingSymptom-based diagnostic playbook with exact error strings and escalation guidance