Overview
An overview of the advanced indexing techniques available in StarTree Cloud.
Apache Pinot is built for scale, effortlessly handling massive datasets and high query throughput. At the heart of its exceptional performance and flexibility are Apache Pinot’s advanced indexing capabilities, enabling users to execute ultra-fast analytics even at petabyte scale. With a comprehensive set of indexing techniques, Pinot empowers users to confidently select the indexes best suited for their unique data characteristics and evolving query patterns.
Why Use Indexes?
- Accelerated Query Performance: Indexes drastically enhance query speed, efficiently pinpointing relevant data segments even at massive scale.
- Optimized Resource Usage: Strategic indexing reduces unnecessary data scans, effectively lowering resource consumption and operational costs.
- Flexible Analytics: A variety of index types allows Pinot to accommodate diverse analytical workloads, ranging from straightforward lookups to complex analytics and sophisticated similarity searches.
Supported Index Types
Apache Pinot supports a wide range of indexes tailored to optimize various query scenarios:
Inverted Index
Maps each value directly to its rows for fast lookups.
Star-tree Index
Delivers superior aggregation performance on large, high-cardinality datasets.
Range Index
Handles numeric range queries efficiently without requiring data sorting.
Sorted Index
Enhances range queries by maintaining data in a sorted sequence.
JSON Index
- Enables fast queries on JSON-structured data.
Geospatial Index
Powers geographic queries, enabling proximity searches and spatial analytics.
Text Index (Lucene)
Provides rapid search capabilities for unstructured text fields through full-text indexing.
Text Index (Native)
Provides rapid search capabilities for unstructured text fields through full-text indexing.
Timestamp Index
Enables fast filtering on timestamp columns by indexing at a defined time granularity.
Vector Index
Supports fast similarity searches on vector embeddings, ideal for Gen AI and recommendation workloads.
Sparse Index
Optimizes high-cardinality equality filters using chunked partitioning.
Bloom Filter
Fast segment pruning for equality queries with minimal memory.
FST Index
Compact regex search on dictionary-encoded text columns.
Dictionary Index
Replaces repeated values with integer IDs for storage efficiency.
Composite JSON Index
An enhanced version of the JSON Index to reduce index size and improve performance.
When selecting the right index for your use case, consider the following:
- Query Patterns: Assess the types of queries you run—point lookups, range queries, aggregations, or similarity searches.
- Data Type and Cardinality: Evaluate column uniqueness, data distribution, and characteristics.
- Performance vs. Storage Trade-offs: Understand that some indexes enhance performance substantially but may require additional storage.
By strategically selecting indexes based on your data and query requirements, Apache Pinot empowers you to deliver blazing-fast analytics at any scale, making complex data exploration powerful and seamless.
Updating Indexes
Updating indexes involves the following abstracted steps:
- Assess the Right Indexes: Determine the appropriate indexes based on your query needs and data characteristics.
- Apply Index Configurations: Configure indexes in your table configuration, referring to each index’s dedicated documentation page for specific configuration options.
- Apply Changes and Reload: Invoke the table reload using the reload API. This process occurs seamlessly, without downtime, and remains completely transparent to active queries.