Older Releases
Release Version 0.10.0: November 2024
Apache Pinot updates since the last StarTree release
- Enhanced support for backfilling data in an upsert enabled table. Users can now upload externally partitioned segments to an upsert-enabled table. [link]
- Added several improvements in UI load times to improve user experience, especially in environments where there are a large number of tables and/or very large tables. [link]
- Added support for OOM protection in multi-stage query engine. This feature will prevent servers from crashing by killing expensive queries. This ensures that other applications are not disrupted. [link] [link]
- Improved scalability by allowing multiple segments to be uploaded together instead of one by one. This is especially useful in cases where a large number of segments are added during initial ingestion or data backfill. [link] [link]
- Improved observability and stability by providing an API for checking the segment state. Users can now determine if some segments need to be reloaded to ensure all replicas are in the correct state. [link]
- Introduced query rate limiter at the database level, which will apply to all the tables in the database at the aggregate level. [link] [link]
- Added MAP type support with string keys and typed values along with the MapItem function, which can extract map values using a key. The support for map type is also added to the Pinot UI. [link] [link]
- Added implementations for comparison (=, !=, >, >=, <, <=, BETWEEN) and binary arithmetic scalar functions for multi-stage query engine. This resolves issues like string comparison failure due to the lack of polymorphism support and incorrect result types for numeric arithmetic. [link]
- Added parameters to support aggregation functions like DISTINCTCOUNTHLL (log2m) in startree index. [link] [link]
- Added Lookup Join strategy as a hint to improve performance when the right table in the JOIN is a Dimension table. [link] [link]
- Introduced a more detailed query execution plan that also provides detailed information about the physical operators being used in the multi-stage query engine. [link] [link]
StarTree Cloud
StarTree extensions for Apache Pinot
- Added flexibility for users to provide the number of retries in case of failures when atomic sync is configured. This allows users to also upgrade their StarTree Cloud environment while data is being ingested.
- Improved performance while ingesting data from Delta Lake or using SegmentImportTask by changing the default value of parameter “push.mode” to “metadata”.
- Added several improvements to Delta Lake 3.0 connector to support Delta Protocol Reader version 3 and Writer version 7.
- Added support for ingesting data from DynamoDB CDC streams using the DynamoDB message decoder. [link]
- Added native support for ingesting Prometheus-formatted metrics data into tables in StarTree Cloud. Users can now leverage the price/performance of StarTree Cloud for their metrics solution built on Prometheus. [link]
- Added the ability to merge smaller segments into large segments to improve performance in an upsert enabled table leveraging SegmentRefreshTask. [link]
- Added TTL for metadata and deleted keys for upsert-enabled tables using Offheap upsert. This will improve scalability and manageability by reducing the size of managed keys and metadata.
- Added data consistency guarantees when running queries while upserts are being processed. Sometimes the result set would not be consistent without this guarantee. [link]
- Improved the server restart time, when needed, by preloading a snapshot of primary keys in an upsert enabled table. In absence of this feature, the primary keys will be built during the server restart, resulting in long restart times. [link]
- Improved scalability and reliability for Dedup by moving the metadata from on-heap implementation to off-heap implementation, similar to off-heap upsert.
- Added several health checks to ensure tables in StarTree Cloud are always optimized for best performance. The list of health checks includes a check to ensure no table in production is running with a single replica of data. [link]
Data Manager
- Added the ability for users to modify the schema and table configuration even after a table has been created, enabling greater flexibility. Users can optimize their table for better performance using Data Manager.
- Added enhanced validation to ensure accurate field type and data type configurations during table creation, reducing errors and improving data integrity.
ThirdEye
- Improved onboarding with the new alert creation flow. Creating alerts is now simpler and faster. Creating dimension exploration alerts is now possible in no-code.
- Added new Impact dashboard. This dashboard provides managers and alert owners a clear and intuitive understanding of the health and performance of all monitored metrics.