column = constant
is used, the sparse index looks for the partition where the constant
belongs. Then it uses the pseudo-inverted index to obtain the chunks where this partition is found. These are the only groups of rows where the predicate may evaluate to true, so this information is used to optimize the IO access.
This method is useful when the segment is stored locally, but it is particularly notable when using tiered storage. In this case, instead of downloading the whole forward index, Pinot only downloads the interesting chunks. This significantly reduces the number of bytes downloaded from the cloud object storage.
pinot.server.instance.index.sparse.enabled
to true
in the server configuration. One way to do so is to use the POST /cluster/configs
API from Swagger.
After adding new configurations, restart existing servers.
pinot.server.instance.index.sparse.enabled
property is not set to true
, you can create sparse indexes, but won’t be able to use them during the query phase.fieldConfigList
section. For example, the following JSON defines a sparse index in the deviceId
column:
Property | Type | Default | Recommended | Affect size | Description |
---|---|---|---|---|---|
chunkSize | int | 8192 | 1 to 16384 | inverse, but sub linear | The chunk size. It must be a power of 2. |
partitions | int | 1000 | Depends on numbers unique values | linearly | The number of partitions used. |
hashFunctionCount | int | 10 | between 1 to 20 | linearly | How many partition mappers are used. |
chunkSize
partitions
hashFunctionCount
seedGenerator
mapperId
murmur
.