StarTree Cloud highlights
New Features
Composite JSON index enhancements
Added support for the following indexes within the Composite JSON index- FST Index
- Text index
Segment Backfill & Purge
- Segment Backfill Dry Run Mode
A new capability was added to preview backfill actions before execution. It allows us to see which segments will be purged and what data will be ingested without making actual changes, reducing the risk of data loss. - Support Segment Purge for Upsert Tables
Segment purge tasks now work with upsert tables by marking rows as deleted, rather than removing them.
Consolidated Preload for Tiered Storage
New mode to download all indexes into a single consolidated mmap file.Config:
preload.enable.index.consolidation (either change to false or to true)
New Table Storage Usage API
Unexpected growth in table storage can increase infrastructure and object storage costs. Pinot stores table data across multiple locations, such as server disk, deep store, and remote object stores used by tiered storage. The Table Storage Usage API reports table size with a breakdown by storage location and highlights mismatches between the expected size and the actual size. For more info, please see the doc.Tiered Storage: Ease of table evolution
Evolving the table schema and config requires running of minion tasks (SegmentRefreshTask or AlterTableTask). This release makes it easy to run these tasks and directly update the remote tier (without going through Pinot servers). This will be a lot more efficient and faster to make changes.Embedded Schema Extraction for Parquet Files
While creating a table, Data Portal now automatically extracts schema from Parquet file metadata, eliminating manual schema inference. Supports primitive types, logical type annotations, and timestamp handling with a new priority system: Provided → Embedded → Inferred schemas.Improvements
Minion Task Execution enhancements & Guardrails
-
Adaptive Disk Usage
Stops mapper phase when disk usage exceeds maxDiskUsagePercentage (default: 85%). This prevents disk full failures during large ingestion jobs. -
Faster reduce phase task generation in Alter Table Task
Added support to parallelize metadata file Downloads in Alter Table Task Reduce phase. This was single threaded before and caused a bottleneck. This is controlled by the table level config flag (
numMetadataFileDownloadThreads, default: 4). -
Default Max Concurrent Tasks per Minion Based on Memory
Automatically derives safe concurrency levels from system memory. This is crucial in preventing out of memory errors. -
Soft File Count Limit for Delta Ingestion
Enforces a ~200k soft limit on file count to prevent excessive segment generation. - Conflict prevention Prevent conflicting scenarios such as running refresh / alter task during Delta ingestion or running SIT/Delta/SRT during an ongoing backfill task.
-
Improve File Ingestion Task (FIT) Documentation, Defaults, and Validations
Includes new defaults for consistent push retries and validation of critical config fields. -
Cluster-Level Max Subtask Limit Enforcement
Ensures subtask count respects cluster safety thresholds. - New Metrics New metrics added to capture consistent push failure, skipped tasks due to conflict and improved task metric accuracy
- Added Tenant rebalance cancellation - Added the ability to cancel tenant rebalance
- Allow manual or ad-hoc trigger of the controller’s periodic task - Added support for triggering controller tasks on demand (one-time execution), rather than requiring them to be scheduled
Performance & Stability
- Controller Thread Pool Defaults
Sets bounded defaults (general executor: 1000, rebalance: 200) to avoid stability issues. - Tiered Storage Enhancements
- Cleanup dangling deep store sessions and stale reduce outputs
New configs:bufferDaysToPurgeOutputSegments(default: 3 days)cleanupDanglingIntermediateFiles(default: true)
- Avoid Prefetching Forward Index When JSON or Text Index Exists
Reduces unnecessary I/O for JSON_EXTRACT_INDEX and text_match operations.
- Cleanup dangling deep store sessions and stale reduce outputs
Bug Fixes
- Fix IndexOutOfBounds in Backfill for Empty Predicates
- Improved backwards compatibility in dangling file cleanup
- Fixes related to task limit validation, config overrides, inconsistent retry configs
- **Segment Import Task: **Allows changing bucket duration (e.g., 1h → 6h) without data loss or duplicates.
Apache Pinot (OSS) highlights
New Features
- Robust OOM Protection
Unified lifecycle/metadata for all query execution threads → safer cancellation, better resource tracking, improved observability. For more information, please see the doc - Apache Arrow Decoder (Experimental)
Adds initial Arrow-format ingestion support. This is intended for improving ingestion efficiency by reducing the processing overhead. - N-gram Filtering Index (Experimental)
Added support for Realtime n-gram index to pre-filter non-matching strings efficiently. - Kafka Client Default Upgraded to Kafka 3
- IP Address Functions
Added helper functions such as ipPrefix, ipSubnetMin, ipSubnetMax, etc. - Array Manipulation Functions
Added helper functions to push elements to front/back for all primitive/string array types.
Improvements
- Support MAP Type for Derived Columns During Reload
-
Partial Upserts Stability
Disable reload on consuming segments; force commit to avoid corruption. -
Segment Reload Failure Tracking
This release adds in-memory tracking for failed reloads. -
Automatic Rewrite of MIN/MAX/SUM on Long/String Types
Rewrites to type-correct variants to avoid precision loss. -
Star-tree Index Build Robustness
- Skip star-tree creation if index build fails
- Roll back to existing index when updates fail
- New metrics to track failures
-
Startree MV Aggregation Support
The following functions: SUMMV, COUNTMV, AVGMV are now supported on multi-value columns -
Async Segment Refresh Message Processing
Enabled by default in StarTree Cloud. - Audit logging filtering improvements - URL based filtering support while collecting audit log
- Perf Improvements in MSE (join optimization) Improved query performance via optimizing hash function usage in the query planner (https://github.com/apache/pinot/pull/16830)

