Executive Summary

This release focuses on making StarTree more lakehouse-native, faster, and more reliable in production.

Lakehouse Integration: Deeper support for Iceberg and Parquet (including complex types and improved readers) makes it easier to run analytics directly on data lakes.
Performance & Query Execution: Enhancements like replica-aware routing, broker-based execution, and improved caching reduce query latency and improve efficiency.
Streaming & CDC: Expanded Debezium support strengthens real-time ingestion pipelines.
Observability & Reliability: New metrics, retry mechanisms, and safeguards improve stability and operational visibility.

Bottom line: Faster queries, better lakehouse integration, and stronger production readiness.

StarTree Cloud highlights

New Features

External Tables: Iceberg (Glue & S3 Tables)

StarTree Cloud can now query Apache Iceberg tables in place — no re-ingestion required. Iceberg tables registered in Nessie, AWS Glue or S3 Tables are queried directly using Pinot’s query engine. For more details, see the External Tables documentation.

External Tables: S3 Remote Parquet Files

StarTree Cloud can now register an S3 location as a Pinot table and query Parquet files in place without ingestion. For more details, see the External Tables documentation.

Query Engine & Execution

Added 2 new features in the query engine for improving performance and isolation

Added replica-group-aware query routing, enabling better isolation and workload-aware query execution.
Introduced support for brokers acting as intermediate stage workers in distributed query execution. This helps in improving elasticity for multi-stage engine queries given that broker scalability is simpler as compared to the servers. The number of brokers used and which brokers are selected can be controlled via Helix tags and query options.

Columnar Segment Processing Framework (CSPF)

A new columnar-first processing framework for segment operations that works column-by-column instead of row-by-row, significantly reducing CPU and memory overhead for large segments. Segment reload, File Ingestion Task, Segment Refresh Task, Alter Table Task, and Segment Import Task are all integrated with CSPF. Supports expression transformations, sorting, sanitization, and time column handling.

Minion Task Orchestration Framework

A new plan-based orchestration framework for managing complex, multi-step Minion task workflows. Task plans define sequences of tasks with dependencies, enabling safer and more predictable execution. Includes REST APIs for managing task plans and support for ad-hoc (one-time) task triggers. File Ingestion Task and the Segment Purge Task are onboarded on this framework.

Segment Purge Task Enhancements

The segment purge capability received major improvements across the board:

Flexible purge criteria — Supports column-level predicates and SQL query-based selectors for targeting rows to purge
Dry run mode — Preview which segments and rows are affected before executing, with verbose reporting of skipped segments
Ad-hoc API — Trigger purge tasks on demand without a scheduled task config
Performance — Replaced DISTINCT with GROUP BY in purge queries for significantly better performance on large tables

Composite JSON Index Enhancements

Added FST and Text index as sub-indexes within the Composite JSON index, with tiered storage support
Added partitioned inverted index — the dictionary and posting lists are split into N sub-indexes by JSON path, reducing per-partition memory pressure and enabling read parallelism
Added PromQL label query support via the JSON index
TEXT_MATCH predicates now work correctly on consuming (real-time) segments

Query Analyzer

Query Analyzer is a new AI-powered tool embedded in the Data Portal Query Console that helps you understand and optimize Apache Pinot multi-stage engine (MSE) queries. It analyzes your SQL query alongside table metadata, explain plans, and execution statistics to produce prioritized, evidence-backed optimization recommendations — without requiring deep Pinot expertise. This is a beta feature, disabled by default. Contact your StarTree account team to have it enabled for your environment.

Ingestion: New Decoder Support

Debezium CDC — Added Debezium decoder support for all Confluent Schema Registry formats (JSON, Protobuf, Avro), enabling CDC pipelines to ingest directly into StarTree without custom transformation
AWS Glue Schema Registry — New stream decoder for Kafka topics encoded with the AWS Glue Schema Registry Avro wire format
Kafka 3.0 Confluent Consumer — Added ConfluentKafkaConsumerFactory for Kafka 3.x clients

Additional New Features

Tiered Storage Caching Integration — Prefetched tiered storage segments are now stored in the Parquet disk cache, reducing repeated S3 access for hot segments
Table Storage Usage API — Added percentile-based segment size stats (p50, p90, p99) and verbose level controls
Deep Store Stale Segment Detection — New API GET /tables/{tableName}/deepstoreStaleSegmentInfo to estimate segments out of sync between servers and the deep store
Cluster Cloner Controls — Added options to conditionally skip deep store copying, table deletion, and schema/config change checks during cluster migration
Delta Ingestion Reliability — Upfront config validation, auto-default S3/GCS parameters, and correct failure propagation when files fail during segment generation
Batch File Listing Optimizations — Narrowed S3 listing scope using glob pattern prefixes; paginated PinotFS listing with filter push-down in the Preview API
gRPC Authentication & Authorization — RBAC enforcement for gRPC requests in broker access control
Introduced Arrow-based Parquet column reader for improved performance.
Added support for:
- Null handling in remote Parquet tables
- Complex types (Struct, List) in Parquet ingestion

Improvements

OOM Protection — OOM resource accounting enabled by default with query kill.
Upsert Stability — Fixed SIGSEGV errors on upsert table startup/shutdown; prevented RocksDB state reuse for partial upsert tables
Improved query performance with prefetchable forward index reader.
Enhanced Parquet read efficiency with:
- Page-level caching
- Cache eviction mechanisms
Audit Identity — Added StarTreeTokenResolver for consistent identity attribution in audit logs
Observability — New metric for long segment replacement durations; Parquet page cache Prometheus metrics with per-layer hit/miss tracking; Preload cache size and buffer usage metrics are added.

Bug Fixes

Purge task no longer uploads empty segments or throws exceptions at the max segment limit
Fixed Iceberg segment name normalization and segment name conflict on partition collisions
Fixed Delta ingestion silent success on file processing errors; fixed preserveNullValues not being honored consistently
Fixed timestamp column min/max values being incorrectly converted to milliseconds
Fixed Parquet reader errors: multi-value string UnsupportedOperationException, INT96 timestamp ClassCastException, and incorrect cache lookups
Fixed gRPC connection failures — now falls back to HTTP gracefully
Fixed Parquet disk cache write position bug on reload; disabled snapshot recovery by default
Fixed GlobPrefixExtractor URI scheme restoration for S3 and GCS paths
Fixed filterColumnsForRow dropping incomplete and sanitized flags in dedup processing
Fixed InstancePoolsNReplicaGroupsCheck health check failure for tables using instancePartitionsMap
Fixed controller crash-loop on restart when schemas are missing

Apache Pinot Highlights

Following changes are made to the open source Apache Pinot project with respect to the last release timeline.

New Features

Enhancements to multi-stage query engine (v2), including better stage execution and scalability.
Improved query routing and server selection, enabling more efficient distributed execution.
Expanded JSON and text indexing capabilities for richer query support.
Added improvements to stream ingestion (Kafka/CDC) for better handling of real-time pipelines.
New/updated minion tasks and table management APIs for maintenance workflows (e.g., purge, dedup).

Improvements

Faster queries via improvements in filter pushdown, segment pruning, and index utilization.
Enhanced upsert and dedup performance, including better handling of edge cases.
Improved Parquet/deep storage integration, making lakehouse-style querying more efficient.
Better metrics and observability for query execution, segment lifecycle, and system health.
Improved broker/server memory and resource usage.
General stability and scalability improvements in distributed query execution.

Bug Fixes

Fixed query correctness issues in joins, aggregations, and edge-case filters.
Resolved upsert inconsistencies and state management issues.
Fixed stream ingestion edge cases (offset handling, schema mismatches).
Addressed segment loading, replacement, and metadata inconsistencies.
Fixed NPEs, race conditions, and build/dependency issues.

Get Started

Ingestion

External Tables

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Release version 0.14.0: March 2026

Executive Summary

StarTree Cloud highlights

New Features

External Tables: Iceberg (Glue & S3 Tables)

External Tables: S3 Remote Parquet Files

Query Engine & Execution

Columnar Segment Processing Framework (CSPF)

Minion Task Orchestration Framework

Segment Purge Task Enhancements

Composite JSON Index Enhancements

Query Analyzer

Ingestion: New Decoder Support

Additional New Features

Improvements

Bug Fixes

Apache Pinot Highlights

New Features

Improvements

Bug Fixes

Get Started

Ingestion

External Tables

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

​Executive Summary

​StarTree Cloud highlights

​New Features

​External Tables: Iceberg (Glue & S3 Tables)

​External Tables: S3 Remote Parquet Files

​Query Engine & Execution

​Columnar Segment Processing Framework (CSPF)

​Minion Task Orchestration Framework

​Segment Purge Task Enhancements

​Composite JSON Index Enhancements

​Query Analyzer

​Ingestion: New Decoder Support

​Additional New Features

​Improvements

​Bug Fixes

​Apache Pinot Highlights

​New Features

​Improvements

​Bug Fixes

Executive Summary

StarTree Cloud highlights

New Features

External Tables: Iceberg (Glue & S3 Tables)

External Tables: S3 Remote Parquet Files

Query Engine & Execution

Columnar Segment Processing Framework (CSPF)

Minion Task Orchestration Framework

Segment Purge Task Enhancements

Composite JSON Index Enhancements

Query Analyzer

Ingestion: New Decoder Support

Additional New Features

Improvements

Bug Fixes

Apache Pinot Highlights

New Features

Improvements

Bug Fixes