Documentation Index
Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
Use this file to discover all available pages before exploring further.
Executive Summary
Release0.15.0 is centered on one major milestone: StarTree Iceberg Tables are production ready for broader customer adoption, with Iceberg catalog support and enhanced operational reliability.
- StarTree Iceberg Tables: Stronger validation, better status/error surfacing, and safer execution paths make Iceberg based external tables easier to run in production at scale.
- Reliability first: Ingestion and task orchestration improvements reduce failure blast radius and improve recoverability for long running jobs.
- Data correctness and type handling: Substantial work landed around Parquet readers, complex type mapping, and column metadata behavior to improve query correctness.
- Operational visibility and security: Better auth metrics, request level tracing, and storage/query observability improve day 2 operations.
StarTree Cloud highlights
New Features
StarTree Iceberg Table is now GA
StarTree Iceberg Table take a major step forward in this release, with a strong focus on production readiness for Iceberg catalogs (including AWS Glue and S3 Tables) and in-place query workflows.- Expanded validation and guardrails for external table onboarding.
- Better operational controls when handling real-world catalog and table configuration edge cases.
- Improved ingestion task behavior and lifecycle handling for external table pipelines.
- IAM role based onboarding support for external Iceberg tables
- More robust task execution flow for external table ingestion runs.
- Clearer status and root-cause error reporting to speed up incident triage.
- Optional handling for file-level failures to keep ingestion progressing when appropriate.
Query and type-system improvements for External Table
This release improves correctness when querying external Parquet and complex schemas:- Better handling for complex and nested type paths.
- Improvements in multi-value reads and metadata interpretation.
- More consistent behavior for column statistics and predicate handling.
Operational observability and auth telemetry
StarTree Cloud now surfaces better operational signals across query, cache, and security paths:- New/expanded metrics for auth plugin behavior.
- Better reader and cache observability for Parquet query paths.
- Improved request-level tracing for external table ingestion components.
Improvements
- Default Kafka AZ aware ingestion: Real-time tables now ingest from Kafka in an AZ aware fashion by default, thus reducing cross-AZ traffic (and hence reducing cost).
- Task and orchestration maturity: Better task metadata tracking, cleaner status transitions, and stronger scheduling/runtime controls.
- Cluster and ingestion safeguards: Additional checks and throttling paths help reduce overload during ingestion and segment operations.
- Parquet path performance tuning: Safer concurrent read behavior and improved cache/prefetch handling for heavy query workloads.
- Reload hardening: Ability to simulate reload operation (dry-run) and cancel in-progress reloads.
Bug Fixes
- Fixed multiple correctness issues in Parquet and complex-type readers that could affect query output under edge schemas.
- Fixed reliability issues in upsert/dedup and RocksDB lifecycle paths to reduce race conditions and memory leaks.
- Fixed several ingestion and controller edge cases, including null handling, status transitions, and flaky runtime behaviors.
- Fixed platform and packaging issues that impacted build stability and release workflows.
Apache Pinot Highlights
This section describes Apache Pinot (open source) changes in the baseline that ships with StarTree Cloud 0.15.0 compared with 0.14.0. It does not describe StarTree-only extensions.New Features
- Vector Search — Full multi-phase vector search: IVF_FLAT, IVF_PQ compressed ANN, filtered ANN, SQL radius queries, HNSW efSearch, IVF_ON_DISK, and adaptive planner.
- MSE Enhancements — Native MSE planning for SUM/AVG over MV column. Broker pruning for non-partitioned leaf paths. Lookup join support in physical optimizer
- Upsert derivations: Post-partial-upsert transforms support derived columns after partial upsert merge.
- Arrow Batch Ingestion — ArrowRecordReader for ingesting Arrow IPC files.
pinot-arrowis included in the standard Pinot binary bundle for Arrow-based columnar read paths. - Distinct Early Termination — Support early termination in combine operator for predictable query latency.
- AI Metadata on Schema/TableConfig — description and tags fields on Schema, FieldSpec, and TableConfig for capturing enhanced user context.
Improvements
- MSE Observability — Upstream/downstream stage ID MDC fields for debugging. Improved error propagation and logging.
- Adaptive Routing — Export adaptive routing stats as broker metrics.
- Auth ordering: Authentication runs before query validation so unauthorized requests fail earlier.
- Large response handling: Cursor-style response lifecycle cleanup moves toward the broker with a batch delete API (review upstream notes if you integrate with response stores).
- Minion observability: Minion task generation logs carry correlation context (MDC) keyed by task id.
- Dynamic Server Thread Pool — Thread pool size can be modified at runtime without restart
Bug Fixes
- Upsert snapshots: Bitmap optimizations during upsert snapshot and segment commit paths.
- Kafka consumption: Partition-level consumer avoids incorrect re-seeking past
read_committed-filtered batches. - Record extraction: Map serialization is canonicalized with more reliable preservation of primitive types in record extractors.
- FUNNEL_COUNT NPE — Fix when WHERE clause filters out all rows
Backwards Incompatible Changes
- Native text and FST index removed — Migrate to Lucene-based text index.
- Inverted index always created during segment generation — Previously optional

