Overview and Purpose
An inverted index is a data structure that maps content (like words or values) to their locations in a database table. Think of it as a book’s index that tells you exactly which pages contain specific words - but for your data. The primary purpose of an inverted index is to dramatically speed up query performance when you’re filtering data. Without an index, the system would need to scan every row to find matching values. With an inverted index, it can jump directly to the relevant rows. Inverted indexes are especially valuable for:- Equality filters (column = value)
- Membership checks (column IN [value1, value2])
- Range queries (column > value)
- Any scenario where you frequently filter on specific columns
How the Index Works
Core Concepts
In traditional “forward” indexes, the system maps row IDs to their corresponding values:- Bitmap Inverted Index: Creates a bitmap (a compact way to represent a set of rows) for each unique value in a column. This makes value lookups extremely fast with constant time complexity.
- Sorted Inverted Index: Creates an index leveraging sorted column data structure for efficient binary search operations. This offers logarithmic lookup time and benefits of data locality.
Example Illustration
For aproductCategory
column with values:
Types of Inverted Indexes
Bitmap Inverted Index
A bitmap inverted index maintains a mapping from each value to a bitmap of rows. This design makes value lookups extremely fast with constant time complexity. When to Use:- For columns with low to medium cardinality (number of unique values)
- When the column is frequently used in filter conditions
- When the column is not sorted
- Constant time lookup performance
- Works well with equality filters and IN clauses
- Can be created on multi-value columns
Sorted Inverted Index
When a column is both sorted and has a dictionary enabled, StarTree Cloud automatically leverages the sorted structure to implement an efficient inverted index. When to Use:- For columns that are already sorted (like timestamps or sequential IDs)
- When the column is frequently used in filter conditions
- Logarithmic lookup time (log(n))
- Better data locality (related values are stored close together)
- Generally better performance than bitmap indexes
- No additional storage overhead (uses the same structure as the forward index)
Configuration
Configuring a Bitmap Inverted Index
To enable a bitmap inverted index on a column in your StarTree Cloud table, add the following configuration to your table definition:Alternative Configuration (Legacy Method)
You can also use the older configuration format, though it’s not recommended for new deployments:Configuring a Sorted Inverted Index
To set up a sorted inverted index:- First, configure the column as a sorted column in your table configuration:
- Ensure dictionary encoding is enabled for this column (it is enabled by default)
Important Considerations
- When the Index is Created: By default, bitmap inverted indexes are created when segments are loaded by Pinot, not during initial segment generation. If you want to create them during segment generation, set
indexingConfig.createInvertedIndexDuringSegmentGeneration
totrue
in your table configuration. - Dictionary Requirement: Both types of inverted indexes require dictionary encoding to be enabled for the column.
- Multi-Value Support: You can create a bitmap inverted index on a multi-value column.