Apache Pinot
Multi stage enhancements
The multi stage query engine now supports theHAVING, ORDER BY, IN, and NOT IN clauses. It also supports left joins, semi joins, and inequality joins. All functions have now been registered with the Calcite catalog reader.
V1 query engine enhancements
Added support for isDistinctFrom and isNotDistinctFrom. More details in the pull request.Multi volume
Added capability to tiering storage locally by attaching multiple disks (eg: SSD + HDD). More details in the Pinot documentation.Segment level debug
Added segment level Debug API and UI. More details in the pull request.Consumer record lag
Added a new API for exposing record based lag during real-time consumption. Currently this is only supported for Kafka data source. More details in the pull request.Custom time boundary for hybrid tables
Added new APIs for configuring a custom time boundary for hybrid Pinot tables as well as a validation API to ensure all segments have finished uploading during offline push (check against ideal state). More details in the pull request.Upserts enhancements
Added enableSnapshot flag in upsertConfig to use snapshot for upsert metadata recovery. This will help achieve TTL (Time To Live) support for primary keys. More details in the Pinot documentation and pull request.Extract metadata from stream event header
Added support for using fields within the message envelope as columns in the Pinot schema (eg: key within a Kafka message envelope). More details in the pull request.Segment Reload enhancements
Added ability to change compression type during segment reload. More details in this PR. In addition, added capability within the Pinot UI to track reload progress. More details in PR-9521 and PR-9700.Adaptive Server Selection
Warning: This feature is experimental, under development, and turned off by default. We recommend using this feature for testing purposes only. When a query is received, we could use one of the implemented Adaptive Selectors (NumInFlightRequests, Latency, Hybrid) to efficiently route queries to the best server instead of using a naive round robin approach. More details can be found in the Pinot documentation.Frictionless Ingestion
- Automatically infer parquet reader type based on file metadata in case of offline ingestion. More details in this PR.
- Added Spark Job Launcher utility for offline table ingestion within the Pinot admin tool for ease of use. More details in this PR .
- Added continueOnError flag within the Pinot table config. If set to true, any errors from data type or expression transformations are ignored and null / default values are used instead. This is useful when users don’t want the ingestion to stop because of a few bad records. More details in PR-9320 and PR-9376 .
MergeRollup task on real-time tables
Added support for merging / rolling up segments of a real-time Pinot table. More details in this PR.Force commit
Added a new resetConsumption API in the controller to force the current consuming segment in a real-time Pinot table to be committed. More details in this PR.Seamless stream change
Added the capability to modify stream properties (for eg: start consuming from a different Kafka topic) without disabling the table. More details in this PR.Logging enhancements
- Added a /loggers API endpoint to change logging level at runtime. More details in this PR.
- Added a new API to allow downloading logs from individual components (broker/server) as well as a new controller API to download any remote log. More details in this PR.
StarTree Extensions for Apache Pinot: Available only in StarTree Cloud
RocksDB backed Upsert (BETA)
Added the capability of configuring RocksDB backend for managing upsert metadata in a Pinot server. This enables the server to handle a lot more primary keys than before (previously this was done in memory).Databricks Delta Ingestion (ALPHA)
Added support for ingesting data from a Databricks Delta table into an offline Pinot table.SegmentRefresh for RT tables (ALPHA)
Added support for performing a segment refresh operations for all completed segments within a real-time Pinot table. This enables users to ensure real-time table segments adhere to the latest Pinot table config.Debezium connector for MySQL (ALPHA)
Added support for ingesting MySQL Debezium CDC format messages from a real-time stream.Tiered storage (BETA)
- Improved server restart time when cloud tiered storage is enabled by persistently caching certain portions of all column indexes needed during restart.
- Query performance improvements using selective columnar fetch based on query pattern and block level reads.
File size based task planner in ingestion (GA)
Added capability to configure minion tasks to ingest in a size based manner in addition to count based. This allows the user to ingest all files from a data source in a single round of tasks based on the total size.StarTree Cloud - includes BYOC (Bring Your Own Cloud) and SaaS
Disaster Recovery for data plane (ALPHA)
StarTree Admin is able to recover a given workspace from a region failure by recovering the StarTree cluster state (RTO: 24 hours)Release decoupling (ALPHA)
StarTree admins can now release individual components like Pinot or Data Manager without requiring a full releaseAuthentication service (GA)
Authentication service for secure access to Startree environments is now GAToken generation for Try environments (GA)
Token generation for secure access to Pinot cluster in trial environments is now GA.Data Manager: Self-Service Ingestion tool
Improved AWS IAM Role based onboarding experience (BETA)
Users now can check the AWS account id directly on DM instead of asking the StarTree customer support team. More enhancements to come in the next release.Dimension Table support (GA)
Users can now create dimension tables from DM.Enhanced Datetime inference logic (GA)
More accurate datetime column inference during data modeling in DMSupport enhanced security mechanisms for Kafka SASL authentication (GA)
Support added in DM for the following SASL mechanisms:- PLAINTEXT
- SCRAM-SHA-256
- SRAM-SHA-512
Support for GZ file ingestion (GA)
Gzipped files can be ingested via DM directly.. Other supported formats: Avro, Json, CSV, Parquet, ORCEnhanced Data ingestion (GA)
Couple of improvements for robust data ingestion experience:- Improved dictionary inference logic
- Data size configuration for Minion

