Release Version 0.11.0: July 2025
Summary
This release brings critical enhancements across Operational cost savings, query performance, system observability, and operational resilience for both StarTree Data Platform and Cloud.
Highlights
- Graviton GA: We are announcing full Support for the Graviton line of CPUs for our customers in AWS for Pinot components. These SKUs are typically 15-20% cheaper.
- AZ aware Kafka consumer: Users can now configure AZ aware Kafka consumer for optimizing data transfer costs across zones. Depending on usage, this can bring about 50-75% reduction in cross AZ data transfer costs.
- Calcite Upgrade (v1.39.0): Improves MSE query performance, especially with large IN clauses.
- Alter Table Task (ATT) GA: Now production-ready, replacing SRT for efficient table updates; includes deep store sync (Alpha) and direct tiered storage uploads.
- Rebalance Enhancements: Smarter defaults, risk checks, reduced data movement, and improved UI and observability.
- Snowflake Connector: Adds secure key-based authentication for ingestion.
Key Improvements
- MSE Logging & Stats: Better error diagnostics, reduced noise, and stats on failed queries.
- Pinot Proxy Optimization: Lower CPU usage with improved broker routing.
- Segment Import Task (SIT): Fixes data loss edge cases, optimizes memory use.
- GRPC & Memory Handling: Adds throttling and auto-recovery to prevent OOM crashes.
- Faster Cleanup & Validations: Batched table deletions, better error messaging, and safer ingestion setups.
Stability Fixes
- Fixes around Kinesis ingestion, stream partition handling, table config updates without minions, and better JSON responses for edge queries.
- Improved MSE thread management reduces contention and enhances throughput.
- Additional validation (e.g., bloom filter on boolean types) and safety checks across ingestion and query paths.
Observability & Monitoring
- New Dashboards: For upserts, rebalances, SRT/ATT tasks, and MSE query load.
- Expanded Metrics & Alerts: More visibility into ingestion errors, ZK node size, broker failures, and query health.
- Alert Refinements: Better thresholds for CPU, GC, and system anomalies.
Following sections go through this in detail
StarTree Data
New Features
- AZ aware Kafka consumer support (Beta)
Users can now configure AZ aware Kafka consumers in their environment. This assumes server pools have been setup in the environment. This can be enabled by specifying this additional property inside the ingestion part of table config:
where, CLOUD_AZ is an environment variable that’s automatically configured on each Pinot server. On startup, each server will automatically get the right zone value for this environment variable.
- Calcite upgrade to 1.39.0
Upgraded the internal Calcite dependency version to 1.39.0. The major changes to Calcite are documented as part of this PR. The primary motivation is to address performance issue in the Multi-Stage Engine (MSE) in the presence of large IN clause (details in this issue).
- Correlation ID for Query Tracking
Introduced support for correlation ID, enabling better tracking and observability across multi-stage engine (MSE) queries. This helps users debug issues with MSE easily by collecting all relevant logs from brokers and servers in one place for a given query. Previously, it was difficult to track down relevant logs for a given query in question, making it harder to identify the root cause of failures / timeouts. Ability to collect all the relevant logs for a given correlation ID improves this debugging process.
By default, Pinot assigns a random correlation ID to the query as soon as the broker receives it, but clients can provide their ID by using the clientQueryId query option as shown below
For more details, please refer to this doc.
-
Deep Store Sync via Alter Table Task (Alpha Release)
Introduced a new mode in Alter Table Task (ATT) to rebuild any segments in the deep store that are out of sync with the corresponding local segments in Pinot Servers, following a table config or schema change. This runs as a background task and hence relieves the Pinot servers from doing this during a restart / upgrade operation, thus reducing overall turnaround time. Note: This background task is optional and intended in scenarios wherein we want to reduce the overhead on Pinot servers.Configure the ATT task config as given below on the table config to enable it
This is an experimental mode and not GA yet.
For more details, refer to this doc.
- Rebalance Phase 2 Enhancements
- minimizeDataMovement option added to Rebalance parameters to minimize network overhead (doc).
- UI enhancements
- Basic and advanced options
- Highlight risky rebalance parameters
- “Dry Run”, Status tracking buttons
- New defaults:
- reassignInstances: true,
- includeConsuming: true,
- minAvailableReplicas: -1
- Pre-check improvements
- Display status of PASS / WARN / FAIL and a message
- Check Disk utilization during and after a rebalance and flag servers that might be at risk (default threshold of 90%)
- Flag risky & missed rebalance parameters
- needReload pre-check now checks all servers
- Summary improvements:
- Always on
- Add tag level summary
- Add summary about the consuming segments and track the top 10 based on most offsets to catch up and oldest segment creation time
- Observability improvements
- Always include the jobId and table name for easier debuggability
- New segment throttle metrics for threshold and current count (doc)
- Improve rebalance progress stats tracking in ZK for better observability into how the rebalance is progressing
- Update state model to prioritize DROPPED over other states, followed by OFFLINE. Modify the state model to allow direct state transitions to DROPPED from ONLINE / CONSUMING rather than the two step process that happens today (this also reduces the number of state transitions overall for DROPPED) - we enabled the rollout of this only in STP though (in OSS users will need to set a config to update the state model)
- Key Based authentication support for Snowflake connector
Added key-pair authentication support when ingesting data from Snowflake (previously we only supported username and password). This provides a more secure way to connect and ingest data from Snowflake. Here’s a sample config to use this new feature:
- Alter Table Task (ATT) - GA
With this release, we’re making the Alter Table Task (ATT) globally available for production. ATT will replace the older SRT (Segment Refresh Task) going ahead and is designed to be more efficient and scalable. More information on ATT can be found in this doc. - Support for altering table config with cloud tiered storage in an efficient manner
Users can now easily make changes to tables with cloud tiered storage (eg: add/remove indexes) in an efficient manner using SRT/ATT jobs. Previously, these jobs would generate new segments based on the specified change and upload to the deep store, which was then uploaded to the remote tier via the Pinot server. Now, this is done directly by the minion job, thus reducing server overhead.
Required configs to enable this feature:
Cluster Config:
Task Config:
should be added in the task Configs list for the table, by default it is false.
For more information, please refer to this doc
Improvements
-
Add stats on errors in MSE
The new MSE engine now shows detailed query stats even when the query fails (which was not true previously). This helps in debugging MSE query errors / timeouts and pinpoint where the failure occurred. Please see this doc for more details
-
Reducing noisy logs in MSE on error
MSE engine will now reduce the log volume during query errors by reclassifying certain errors and adjusting how query errors are logged, minimizing redundant stack traces. Previously this caused a lot of logs to be generated during query issues, leading to excessive disk usage.
- Pinot Proxy CPU optimization
Pinot Proxy allows users to send queries easily to multiple broker tenants. The proxy will determine the relevant set of brokers for the requested table(s) and route accordingly. More information on Pinot proxy can be found here. Earlier, this was adding some CPU overhead which was not ideal.
In this release, we’ve added a bunch of optimizations to reduce the CPU overhead when queries are routed via the Pinot proxy. This includes the ability to pass a list of table names in the header (previously this was done by parsing the query), preferring the local broker and other optimizations.
- Always on Upsert related background tasks
Automatic task scheduling for upsert-enabled tables with built-in support for:- UpsertSnapshotCreationTask
- Improved Logging for catching bad queries
Queries are now logged as they begin execution—helping ops teams detect and cancel problematic queries in real-time before they degrade the cluster. Earlier this was done only on query completion which was insufficient. - SegmentImportTask (SIT) Improvements
In StarTree Cloud, users can use the SIT task to move data between realtime and offline table (hybrid mode) after a certain age. We made a bunch of improvements to SIT in this release including:- Data loss bug fix: There was a scenario in which data from the realtime table can be deleted (due to retention) before its moved to the offline table. This is now fixed.
- Segment size improvements: Improved the algorithm to create appropriately sized segments in the offline table.
- Minion OOM issue: Minion tasks now initialize the record readers in a lazy manner, thus preventing all the segments to be loaded at once - which in turn causes memory overhead and may lead to OOM failures.
- Improved runbook related to SIT.
SIT docs can be found here.
- GRPC & Trino Error Handling Enhancements
Added support for better error feedback when segments are unavailable during GRPC or Trino-based query execution. - Improved Error Messages for Groovy Transform Failures
Enhanced visibility into Groovy-based transform failures during table creation flows with clearer error logging. - Optimized Table Deletion (including deep store)
Optimized table deletion by deleting segment files in batches (which was done sequentially before). - Create an alert for missing upsert snapshots
Created alert for missing upsert snapshot config. This will help prevent scenarios wherein server goes into a crash loop because the snapshots were stale and the disk wasn’t enough for the table to generate the snapshots. This is tracked using new metric:
CLUSTER_HEALTH_TASK_UPSERT_PREBUILT_SNAPSHOT_DISABLED_TABLES_COUNT
Bug Fixes & Stability
- Fix for Table Config Updates in the absence of minion pod
Prevented table config update failures when no minion pod is up, by avoiding minion dependency for unrelated config changes. - Comprehensive clean up of historical segments in deep store
Fixed issue where .tar.gz segments created via SRT tasks were skipped by the Retention Manager. - Better validation for FileIngestion mode
Disallowed switching between sync and append ingestion modes post table creation to ensure state consistency. - Direct Memory OOM Auto-Recovery
Enhanced Pinot servers to recover from out-of-memory (OOM) situations for direct memory without needing manual pod restarts. - Upsert Snapshot Task Monitoring
Added failure/success tracking and alerting for snapshot creation jobs not running within 72 hours. - Query OOM Risk Visibility
Log queries that generate an excessive number of groups beyond a configurable threshold - as soon as it happens (instead of waiting for the query to complete). This helps in triaging failures in the presence of such expensive queries. - Bloom filter validation
Prevent adding bloom filter on boolean type column at the time of table creation (earlier this caused failures during segment load). - Improved handling of stream partition unavailability
Fixed an issue in cases when a stream partition is no longer available (reached end of life), trying to create a consuming segment for it can cause NPE. Earlier this caused ingestion from Kinesis to come to a halt when the corresponding partition/shard was unavailable. - Fix for ingestion issues from Kinesis during resharding + pause and resume
Fixed issue where ingestion completely stopped from Kinesis when pause/resume was invoked post resharding of Kinesis stream . - Fix the JSON response payload for always false queries
Always false queries eg: (select … where 1=0) now return the correct empty JSON response, something like this:
Previously, no response was returned which caused issues in the invoking application
- Improved Thread Management in MSE
Improved thread handling by eliminating unnecessary blocking in query submission and GRPC execution paths. The updated design simplifies the code and reduces reliance on intermediate thread pools, resulting in better performance and lower thread contention in production environments. With this, MSE servers can now use GRPC in direct mode, which is more efficient.
- Prevent server OOM due to GRPC queries
Introduced a memory based throttling mechanism for GRPC requests. When a GRPC query server’s direct memory usage is high and exceeds a certain threshold, it will start rejecting further requests with RESOURCE_EXHAUSTED status code until memory usage goes below the threshold.
Breaking Changes
- New server instances will now be “untagged”
In the past, when new servers were added to an existing cluster, they were automatically tagged with DefaultTenant, which caused issues in multi-tenant configurations. Starting 0.11.0, new servers will need to be manually tagged to assign them to a particular tenant.
- Removed pinot-orc module
The pinot-orc module was removed from the default packaging due to vulnerabilities concerns. This will impact anyone using ORC format (ingestion) and needs to be manually added.
- Enable logical type in Avro schema by default
This makes parsing Avro schema easier (eg: in case of BigDecimal which is encoded as a byte array). However, this could cause an issue if the user is trying to encode custom data in the raw format.
StarTree Cloud Platform
New Features
-
Graviton GA
We’re now able to provision Graviton instances for Pinot components (Server, Broker, Controller, Minion). Graviton instances are roughly 10-20% more cost effective.
-
First class support for workspaces (Private Preview)
We’re now able to support workspaces in a given Startree Cloud environment (SaaS or BYOC). This is currently in private preview and needs a manual configuration to enable. Workspaces provide logical isolation in the same environment. In other words - users of one workspace can only see and query tables in that workspace and nothing more. This is a good mechanism in case you want to separate your data across business units or workloads.
Improvements
- JWT Handling: Added expiry checks for JWTs. Earlier this was not honored leading to the session not being terminated post expiry.
Dashboards & Metrics
- Grafana Updates: Added new dashboards and visual panels, including:
- Dashboard for capturing various broker level query errors
- New upsert/dedup dashboard
- New dashboard for capturing rebalance related metrics
- Segment Refresh Task (SRT), Alter Table Task (ATT) metrics dashboard
- Estimated total server MSE query threads on broker dashboard
- Pauseless GC dashboards
- Improved CPU usage panels for brokers, servers, controllers
- Metrics Enhancements:
- Added comprehensive metrics for broker-level query errors (e.g. queryErrorBrokerSegmentUnavailable, queryErrorAccessDenied, queryErrorJsonParsing, etc.), improving diagnosis and alerting.
- Ingestion metrics
- Rows filtered due to filter functions
- Rows dropped due to transform or decode exceptions
- Znode size monitoring
- Introduced metrics and alerts for ZNode sizes under
- /IDEALSTATES/
- /EXTERNALVIEW/
- /PROPERTYSTORE/SEGMENTS/.
- Introduced metrics and alerts for ZNode sizes under
- Fixed Prometheus metric name for Pinot server Netty metrics
- Improved metrics related to Authentication performance
Alerts & Monitoring
- New Alerts:
- Snapshot creation failure
- Pinot Znode size near ZK buffer limit
- Query error code rate
- Alert Improvements: Updated CPU usage alert definitions
- Increased duration for HighGCPercent and refined threshold