Why Use Indexes?
- Accelerated Query Performance: Indexes drastically enhance query speed, efficiently pinpointing relevant data segments even at massive scale.
- Optimized Resource Usage: Strategic indexing reduces unnecessary data scans, effectively lowering resource consumption and operational costs.
- Flexible Analytics: A variety of index types allows Pinot to accommodate diverse analytical workloads, ranging from straightforward lookups to complex analytics and sophisticated similarity searches.
Supported Index Types
Apache Pinot supports a wide range of indexes tailored to optimize various query scenarios:Inverted Index
Maps each value directly to its rows for fast lookups.
Star-tree Index
Delivers superior aggregation performance on large, high-cardinality datasets.
Range Index
Handles numeric range queries efficiently without requiring data sorting.
Forward Index
Enhances range queries by maintaining data in a sorted sequence. Types: Dictionary-Encoded, Sorted, and Raw Value.
JSON Index
- Enables fast queries on JSON-structured data.
Geospatial Index
Powers geographic queries, enabling proximity searches and spatial analytics.
Text Index (Lucene)
Provides rapid search capabilities for unstructured text fields through full-text indexing.
Text Index (Native)
Provides rapid search capabilities for unstructured text fields through full-text indexing.
Timestamp Index
Enables fast filtering on timestamp columns by indexing at a defined time granularity.
Vector Index
Supports fast similarity searches on vector embeddings, ideal for Gen AI and recommendation workloads.
Sparse Index
Optimizes high-cardinality equality filters using chunked partitioning.
Bloom Filter
Fast segment pruning for equality queries with minimal memory.
FST Index
Compact regex search on dictionary-encoded text columns.
Dictionary Index
Replaces repeated values with integer IDs for storage efficiency.
Composite JSON Index
An enhanced version of the JSON Index to reduce index size and improve performance.
- Query Patterns: Assess the types of queries you run—point lookups, range queries, aggregations, or similarity searches.
- Data Type and Cardinality: Evaluate column uniqueness, data distribution, and characteristics.
- Performance vs. Storage Trade-offs: Understand that some indexes enhance performance substantially but may require additional storage.
Updating Indexes
Updating indexes involves the following abstracted steps:- Assess the Right Indexes: Determine the appropriate indexes based on your query needs and data characteristics.
- Apply Index Configurations: Configure indexes in your table configuration, referring to each index’s dedicated documentation page for specific configuration options.
- Apply Changes and Reload: Invoke the table reload using the reload API. This process occurs seamlessly, without downtime, and remains completely transparent to active queries.