This feature requires StarTree release 0.14.0 or later, and must be enabled on demand — contact StarTree support to activate it.
Common issues when onboarding and querying External Tables, grouped by symptom. Each entry lists the likely cause and the fix. If an issue isn’t covered here, reach out to StarTree support.
Onboarding
Onboarding fails with a region error
Symptom: Validation or preview fails with an AWS region error, often after entering the bucket path.
Cause: The AWS region isn’t set, or the bucket field contains an s3://... URL or a trailing slash.
Fix:
- Enter the bucket name only in the bucket field — not
s3://bucket/....
- Remove any trailing
/ from the prefix.
- Set the AWS region in the connection config (
catalog.s3.region / the tier region). When the region is in config, the AWS_REGION environment variable is only a fallback and isn’t required.
Access Denied (HTTP 403) when creating the table
Symptom:
software.amazon.awssdk.services.s3.model.AccessDeniedException: Access Denied
(Service: S3, Status Code: 403, ...)
Cause: The cluster cannot read the source bucket. Credentials valid elsewhere are not necessarily the ones the cluster uses.
Fix:
- Confirm whatever the table uses for access — an assumed IAM role (
roleArn/externalId), the cluster’s node IAM role, or static access keys — has s3:GetObject and s3:ListBucket on the source bucket and prefix.
- Verify from inside the cluster (e.g. a debug pod):
aws s3 ls s3://<bucket>/<prefix>/
- If listing fails there, fix the bucket policy / role before retrying onboarding.
Preview can’t read the files / wrong path
Symptom: Validation succeeds but preview errors, or it looks for a default table.
Cause: The prefix points at the wrong level — for raw Parquet it should point at the folder that directly contains the .parquet files (or the table root for Iceberg).
Fix: Adjust the prefix to the correct level. Use aws s3 ls to confirm the path you give actually lists Parquet files (or an Iceberg metadata/ + data/ layout).
Very large source (hundreds of thousands of files)
Symptom: Onboarding a path with hundreds of thousands or millions of files is slow or never produces segments.
Cause: Every file becomes work for the catalog scan and segment generation.
Fix: For an initial proof-of-concept, point at a smaller sub-prefix (for example, a single month’s partition). Scale the cluster for the full dataset. There is a per-table segment threshold; work with StarTree support for very large tables.
Schema & table creation
Symptom:
java.lang.NumberFormatException: For input string: "Binary{2 reused bytes [48 50]}"
Cause: A column’s data type was changed away from what preview inferred — for example a binary/string column was set to INT/LONG. The Parquet data no longer matches the declared Pinot type.
Fix: Use the data types the preview step produced. Don’t override inferred types in the schema. See Data Type Mapping.
Cannot create inverted index on column ... without dictionary
Symptom: Table creation is rejected for an inverted (or FST/IFST) index on a RAW column.
Cause: Since release 2.164.0, dictionary-backed indexes require an explicit dictionary block — they no longer build an implicit dictionary.
Fix: Keep the forward index RAW and add a dictionary block alongside the index:
"indexes": {
"forward": { "encodingType": "RAW" },
"dictionary": {},
"inverted": { "disabled": false }
}
Failed to create FieldIndexConfigs
Symptom: Table creation fails with this generic message.
Cause: A malformed or conflicting index configuration — often a hand-edited combination of forward, dictionary, and an index entry.
Fix: Start from the table config the preview step generates and add only supported indexes. Don’t mix incompatible options on one column.
Inverted index on a multi-value column fails to build
Symptom: Segment build fails on a multi-value column that has an inverted index (e.g. Cannot create inverted index for raw index column, or a “raw inverted index not supported for multi-value columns” message).
Cause: The inverted index needs a dictionary, and a raw (no-dictionary) inverted index isn’t supported — multi-value columns are especially likely to hit this.
Fix: Add a dictionary block to the column (as above). If the build still fails specifically on a multi-value column, remove the inverted index from it and reach out to support.
Queries
Query times out or returns servers not responded
Symptom:
427: N servers [...] not responded
or a group-by/aggregation that never returns within the timeout.
Causes & fixes — check in order:
- Missing or conflicting index config (most common). Aggregations and filters scanning remote data without the right index are slow. Add a supported index for your filter/group-by columns, and remove conflicting or leftover index configs.
- Caching not enabled. Turn on the page cache and preload so index data is local. See Best Practices and Configs (
enable.prefetch.page.cache, preload.enable, preload.index.keys.override).
- Group-by on a derived/computed column or a
$segmentName filter — these defeat pruning. Group by a real column and drop debugging filters.
- Under-provisioned servers. A single small server against a large dataset will be CPU-bound. Scale out.
- Raise the query timeout while debugging:
SET "timeoutMs" = '60000';.
First query on a column is slow, later queries are fast
Symptom: Cold query is slow; the same query is fast afterward.
Cause: The first read populates the cache from object storage.
Fix: Expected behavior. To pay this cost at load time instead of on the first query, enable pre-warm (pinot.parquet.prewarm.enabled) and preload.enable. See Data and Index Caching.
Tuning a large scan
For wide scans over big datasets, these query options help (see Best Practices and Configs):
SET "enable.prefetch.page.cache" = 'true';
SET "prefetch.projection.queue.size" = '10';
SET "readAhead.enable" = 'true';
Server OOM / pod restarts under query load
Symptom: A server runs out of native memory or is OOM-killed, often during a large scan (Native memory allocation (mmap) failed, or repeated pod restarts).
Causes & fixes:
- Too many memory maps. Wide tables create one mmap per column index; this can exhaust the OS
max_map_count. Enable index consolidation (preload.enable.index.consolidation=true) to pack a segment’s indexes into one file.
- Cache / prefetch over-allocation. Cap the in-memory caches and prefetch buffer (
pinot.parquet.page.cache.memory.*, ...prefetch.size.mb) relative to server heap.
- A heavy query. Enable query OOM protection so one query is killed instead of the server.
Sync & operations
Could not acquire table level distributed lock ... ExternalTableSyncTask
Symptom:
Could not acquire table level distributed lock for scheduled task type:
ExternalTableSyncTask, table: <name>_OFFLINE. Another controller is likely
generating tasks for this table. Please try again later.
Cause: Another controller is already running a sync for the table.
Fix: Benign — retry later. The run in progress continues normally.
A sync run fails because of one unreadable file
Symptom: A sync run fails, and the failure traces back to a single problematic Parquet file.
Cause: By default a run fails if any file can’t be read, so the whole snapshot is rejected.
Fix: Set continueOnFileError=true in the ExternalTableSyncTask config to skip unreadable files and continue. The run still ends as status=COMPLETED; compare filesDiscovered vs segmentsUploaded in the status endpoint to spot skipped files, then investigate them separately.
Table created, but no sync ever starts
Symptom: The table exists, but status stays IDLE and the observability endpoints return nothing useful.
Cause: The table isn’t on the controller-watcher path — usually executor=controller is missing from the ExternalTableSyncTask config, or the feature isn’t enabled on the cluster.
Fix: Confirm executor: controller is in the task config (the preview/onboarding flow sets it), and that the External Table feature is enabled on the cluster. Then trigger a run to start immediately: POST /tasks/schedule?taskType=ExternalTableSyncTask&tableName=<name>_OFFLINE.
Sync not progressing
Use the status endpoint to diagnose:
IDLE and never advancing — check executor=controller, a valid schedule cron, and that the feature is enabled (see above).
RUNNING for a long time — large source or under-provisioned servers; reduce the prefix or scale out.
FAILED — read failurePhase:
FILE_LISTING → credentials/path issue
SEGMENT_GENERATION → data-type mismatch (see NumberFormatException above; escalate if the config is valid)
SEGMENT_COMPRESSION → server resources
SEGMENT_UPLOAD → deep-store permissions
CHECKPOINT_SAVE → controller/ZooKeeper issue
Checking sync health
To see why ingestion isn’t progressing, use the observability endpoints: run status (and failurePhase on failure), the ingestion checkpoint, and the source file count.
When to escalate to engineering
Most issues are self-serviceable — auth/region/path, index and dictionary config, cron, and cache/OOM tuning are all covered above. Escalate to engineering when:
SEGMENT_GENERATION or CHECKPOINT_SAVE keeps failing with a valid config.
- Servers still OOM after capping caches and enabling consolidation + OOM protection.
status=COMPLETED but query results are wrong or missing.
- A specific Parquet type fails to read (e.g. a
getBytes/decimal error), or you hit a very-large-table segment threshold.