Release Version 0.6.0: Feb 2023

Get Started

Ingestion

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Apache Pinot

Multi stage enhancements

The multi stage query engine now supports the HAVING, ORDER BY, IN, and NOT IN clauses. It also supports left joins, semi joins, and inequality joins. All functions have now been registered with the Calcite catalog reader.

V1 query engine enhancements

Added support for isDistinctFrom and isNotDistinctFrom. More details in the pull request.

Multi volume

Added capability to tiering storage locally by attaching multiple disks (eg: SSD + HDD). More details in the Pinot documentation.

Segment level debug

Added segment level Debug API and UI. More details in the pull request.

Consumer record lag

Added a new API for exposing record based lag during real-time consumption. Currently this is only supported for Kafka data source. More details in the pull request.

Custom time boundary for hybrid tables

Added new APIs for configuring a custom time boundary for hybrid Pinot tables as well as a validation API to ensure all segments have finished uploading during offline push (check against ideal state). More details in the pull request.

Upserts enhancements

Added enableSnapshot flag in upsertConfig to use snapshot for upsert metadata recovery. This will help achieve TTL (Time To Live) support for primary keys. More details in the Pinot documentation and pull request.

Extract metadata from stream event header

Added support for using fields within the message envelope as columns in the Pinot schema (eg: key within a Kafka message envelope). More details in the pull request.

Segment Reload enhancements

Added ability to change compression type during segment reload. More details in this PR. In addition, added capability within the Pinot UI to track reload progress. More details in PR-9521 and PR-9700.

Adaptive Server Selection

Warning: This feature is experimental, under development, and turned off by default. We recommend using this feature for testing purposes only.

When a query is received, we could use one of the implemented Adaptive Selectors (NumInFlightRequests, Latency, Hybrid) to efficiently route queries to the best server instead of using a naive round robin approach. More details can be found in the Pinot documentation.

Frictionless Ingestion

Automatically infer parquet reader type based on file metadata in case of offline ingestion. More details in this PR.
Added Spark Job Launcher utility for offline table ingestion within the Pinot admin tool for ease of use. More details in this PR .
Added continueOnError flag within the Pinot table config. If set to true, any errors from data type or expression transformations are ignored and null / default values are used instead. This is useful when users don’t want the ingestion to stop because of a few bad records. More details in PR-9320 and PR-9376 .

MergeRollup task on real-time tables

Added support for merging / rolling up segments of a real-time Pinot table. More details in this PR.

Force commit

Added a new resetConsumption API in the controller to force the current consuming segment in a real-time Pinot table to be committed. More details in this PR.

Seamless stream change

Added the capability to modify stream properties (for eg: start consuming from a different Kafka topic) without disabling the table. More details in this PR.

Logging enhancements

Added a /loggers API endpoint to change logging level at runtime. More details in this PR.
Added a new API to allow downloading logs from individual components (broker/server) as well as a new controller API to download any remote log. More details in this PR.

StarTree Extensions for Apache Pinot: Available only in StarTree Cloud

RocksDB backed Upsert (BETA)

Added the capability of configuring RocksDB backend for managing upsert metadata in a Pinot server. This enables the server to handle a lot more primary keys than before (previously this was done in memory).

Databricks Delta Ingestion (ALPHA)

Added support for ingesting data from a Databricks Delta table into an offline Pinot table.

SegmentRefresh for RT tables (ALPHA)

Added support for performing a segment refresh operations for all completed segments within a real-time Pinot table. This enables users to ensure real-time table segments adhere to the latest Pinot table config.

Debezium connector for MySQL (ALPHA)

Added support for ingesting MySQL Debezium CDC format messages from a real-time stream.

Tiered storage (BETA)

Improved server restart time when cloud tiered storage is enabled by persistently caching certain portions of all column indexes needed during restart.
Query performance improvements using selective columnar fetch based on query pattern and block level reads.

File size based task planner in ingestion (GA)

Added capability to configure minion tasks to ingest in a size based manner in addition to count based. This allows the user to ingest all files from a data source in a single round of tasks based on the total size.

StarTree Cloud - includes BYOC (Bring Your Own Cloud) and SaaS

Disaster Recovery for data plane (ALPHA)

StarTree Admin is able to recover a given workspace from a region failure by recovering the StarTree cluster state (RTO: 24 hours)

Release decoupling (ALPHA)

StarTree admins can now release individual components like Pinot or Data Manager without requiring a full release

Authentication service (GA)

Authentication service for secure access to Startree environments is now GA

Token generation for Try environments (GA)

Token generation for secure access to Pinot cluster in trial environments is now GA.

Data Manager: Self-Service Ingestion tool

Improved AWS IAM Role based onboarding experience (BETA)

Users now can check the AWS account id directly on DM instead of asking the StarTree customer support team. More enhancements to come in the next release.

Dimension Table support (GA)

Users can now create dimension tables from DM.

Enhanced Datetime inference logic (GA)

More accurate datetime column inference during data modeling in DM

Support enhanced security mechanisms for Kafka SASL authentication (GA)

Support added in DM for the following SASL mechanisms:

PLAINTEXT
SCRAM-SHA-256
SRAM-SHA-512

Support for GZ file ingestion (GA)

Gzipped files can be ingested via DM directly..

Other supported formats: Avro, Json, CSV, Parquet, ORC

Enhanced Data ingestion (GA)

Couple of improvements for robust data ingestion experience:

Improved dictionary inference logic
Data size configuration for Minion

Support for schema registry in Kafka connector (GA)

Users will only see the topics that are registered in the schema registry and the data format is no longer needed to be selected.

Record reader config support (GA)

Users can now pass the record reader specific configs via DM. (e.g. split delimiter for csv reader)

BigQuery connector (GA)

Users can now self-serve data ingestion from BigQuery using Data Manager no-code experience with a few clicks.

For more information, see https://dev.startree.ai/docs/startree-extensions/sql-connector.

Dataset ingestion status (GA)

Users can now monitor the data ingestion status and view ingestion logs after submitting the ingestion jobs. This will help users to debug issues and fix them as needed.

For more information, see https://dev.startree.ai/docs/startree-enterprise-edition/startree-dataset-manager/ingestion-status.

Kinesis connector (GA)

Users can now self-serve data ingestion from AWS Kinesis using Data Manager no-code experience with few clicks.

For more information, see https://dev.startree.ai/docs/startree-enterprise-edition/startree-dataset-manager/kinesis.

ThirdEye: Anomaly Detection and Root Cause Analysis Tool

Cohort recommender (GA)

Cohort recommender) will help users identify the cohorts of top contributors (single or a group of dimensions) contributing to a spike in a given metric of a dataset. Using this feature user can now monitor single or multiple time series data for a given time range for the in-scope dataset and metrics.

For more information, see https://dev.startree.ai/docs/startree-enterprise-edition/startree-thirdeye/reference/cohort-recommender.

Guided onboarding and alert creation flow (GA)

Users of ThirdEye can now use guided ThirdEye onboarding and sample alerts to quickly try, evaluate and get started with ThirdEye in a few clicks.

Integrated flow for cohort recommender and multi-dimension alert creation (GA)

Users of ThirdEye can now use cohort recommender to identify top contributors (single or group of dimensions) for a given metric and monitor single or multiple time series in two to three clicks. Using these features users need not manually configure alerts which can be time-consuming and prone to error.

Users can now bulk delete alerts, anomalies, and related entities to clean up all experiment data or static data which are no longer in use.

Anomaly filters (GA)

Now users can apply “Anomaly filters” (such as Threshold-based, Weekend, Holiday, and Cold_Start filters) as part of the low-code alert configuration to fine-tune the alerts and improve accuracy by eliminating noise.

Root-cause analysis (Dimension filter (include/exclude) (GA)

Users can now perform root-cause analysis for a selected set of dimensions instead of entire list of dimensions.

For more information, see https://dev.startree.ai/docs/startree-enterprise-edition/startree-thirdeye/concepts/alert-configuration#rcaexcludeddimensions.

Launched Dimension Exploration (GA)

Users can now monitor multiple time series for single or group dimensions for a dataset and metric by configuring a single one-time (low-code/no-code) alert.

For more information, see https://dev.startree.ai/docs/startree-enterprise-edition/startree-thirdeye/concepts/dimension-exploration.

Load default alert templates (GA)

Now users can load default alert templates from the Alert Templates Configuration screen instead of using API to load default alert templates.

Alert creation (Derived/transformed metrics support) (GA)

Users can now create “derived or transformed metrics” during alert creation.

For more information, see https://dev.startree.ai/docs/startree-enterprise-edition/startree-thirdeye/how-tos/alert/derived-metric-alert.

0.6.1 0.5.0

On this page

Apache Pinot
Multi stage enhancements
V1 query engine enhancements
Multi volume
Segment level debug
Consumer record lag
Custom time boundary for hybrid tables
Upserts enhancements
Extract metadata from stream event header
Segment Reload enhancements
Adaptive Server Selection
Frictionless Ingestion
MergeRollup task on real-time tables
Force commit
Seamless stream change
Logging enhancements
StarTree Extensions for Apache Pinot: Available only in StarTree Cloud
RocksDB backed Upsert (BETA)
Databricks Delta Ingestion (ALPHA)
SegmentRefresh for RT tables (ALPHA)
Debezium connector for MySQL (ALPHA)
Tiered storage (BETA)
File size based task planner in ingestion (GA)
StarTree Cloud - includes BYOC (Bring Your Own Cloud) and SaaS
Disaster Recovery for data plane (ALPHA)
Release decoupling (ALPHA)
Authentication service (GA)
Token generation for Try environments (GA)
Data Manager: Self-Service Ingestion tool
Improved AWS IAM Role based onboarding experience (BETA)
Dimension Table support (GA)
Enhanced Datetime inference logic (GA)
Support enhanced security mechanisms for Kafka SASL authentication (GA)
Support for GZ file ingestion (GA)
Enhanced Data ingestion (GA)
Support for schema registry in Kafka connector (GA)
Record reader config support (GA)
BigQuery connector (GA)
Dataset ingestion status (GA)
Kinesis connector (GA)
ThirdEye: Anomaly Detection and Root Cause Analysis Tool
Cohort recommender (GA)
Guided onboarding and alert creation flow (GA)
Integrated flow for cohort recommender and multi-dimension alert creation (GA)
Bulk delete support for Alerts, anomalies and related entities (GA)
Anomaly filters (GA)
Root-cause analysis (Dimension filter (include/exclude) (GA)
Launched Dimension Exploration (GA)
Load default alert templates (GA)
Alert creation (Derived/transformed metrics support) (GA)

Get Started

Ingestion

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

​Apache Pinot

​Multi stage enhancements

​V1 query engine enhancements

​Multi volume

​Segment level debug

​Consumer record lag

​Custom time boundary for hybrid tables

​Upserts enhancements

​Extract metadata from stream event header

​Segment Reload enhancements

​Adaptive Server Selection

​Frictionless Ingestion

​MergeRollup task on real-time tables

​Force commit

​Seamless stream change

​Logging enhancements

​StarTree Extensions for Apache Pinot: Available only in StarTree Cloud

​RocksDB backed Upsert (BETA)

​Databricks Delta Ingestion (ALPHA)

​SegmentRefresh for RT tables (ALPHA)

​Debezium connector for MySQL (ALPHA)

​Tiered storage (BETA)

​File size based task planner in ingestion (GA)

​StarTree Cloud - includes BYOC (Bring Your Own Cloud) and SaaS

​Disaster Recovery for data plane (ALPHA)

​Release decoupling (ALPHA)

​Authentication service (GA)

​Token generation for Try environments (GA)

​Data Manager: Self-Service Ingestion tool

​Improved AWS IAM Role based onboarding experience (BETA)

​Dimension Table support (GA)

​Enhanced Datetime inference logic (GA)

​Support enhanced security mechanisms for Kafka SASL authentication (GA)

​Support for GZ file ingestion (GA)

​Enhanced Data ingestion (GA)

​Support for schema registry in Kafka connector (GA)

​Record reader config support (GA)

​BigQuery connector (GA)

​Dataset ingestion status (GA)

​Kinesis connector (GA)

​ThirdEye: Anomaly Detection and Root Cause Analysis Tool

​Cohort recommender (GA)

​Guided onboarding and alert creation flow (GA)

​Integrated flow for cohort recommender and multi-dimension alert creation (GA)

​Bulk delete support for Alerts, anomalies and related entities (GA)

​Anomaly filters (GA)

​Root-cause analysis (Dimension filter (include/exclude) (GA)

​Launched Dimension Exploration (GA)

​Load default alert templates (GA)

​Alert creation (Derived/transformed metrics support) (GA)