The Star-tree index boosts aggregation and group-by query performance. It pre-aggregates data across multiple dimensions to reduce query latency and storage usage.
A star-tree index is an advanced multi-dimensional indexing structure that significantly accelerates aggregation and group-by queries by using pre-computed aggregation results. Unlike single-column indexes, star-tree indexes work across multiple dimensions to dramatically reduce the number of records that need to be processed at query time.One of the biggest challenges in real-time analytics is maintaining low query latencies on large datasets while efficiently using storage resources. Traditional indexes improve query performance but are still limited by the number of records that must be processed. Pure pre-aggregation strategies guarantee fast query responses but can lead to storage explosion.Star-tree indexes in StarTree Cloud provide an optimal balance between these approaches by selectively pre-aggregating data based on common query patterns, offering:
Predictable, low query latencies for aggregation operations
Efficient use of storage space compared to full pre-aggregation
Significant performance improvements for multi-dimensional queries
Configurable trade-offs between query speed and storage requirements
Star-tree indexes are particularly valuable for analytical workloads with high query volumes on large datasets where aggregation queries are common.
The star-tree index creates a hierarchical tree structure that organizes data based on multiple dimensions. The key elements of this structure include:
Dimension-based Organization: Data is organized hierarchically based on an ordered list of dimensions, with each level in the tree representing a particular dimension.
Pre-aggregation: For each node in the tree, metrics are pre-aggregated, allowing the system to directly use these pre-computed results when possible.
Star Nodes: Special nodes that contain pre-aggregated results after removing a specific dimension, enabling efficient handling of queries that don’t filter on certain dimensions.
Configurable Depth: The tree depth and node size can be configured to balance between storage requirements and query performance.
Consider a dataset with dimensions Country, Browser, and Locale, and a metric Impressions:
Traditional Approach: To find the sum of Impressions for USA, Chrome across all Locales, the system would need to scan all matching records and compute the sum at query time.
Star-tree Approach:
The data is organized hierarchically by Country, then Browser, then Locale
At the Country=USA, Browser=Chrome level, a star node pre-computes the sum across all Locales
The query directly uses this pre-computed result without scanning individual records
This approach significantly reduces query time by eliminating the need to process individual records when aggregate values can be used instead.