Overview and Purpose
The Composite JSON Index is an enhanced version of the JSON Index.The Composite JSON Index is available from Startree version 0.11.0.
- Indexing select path(s) with internal range and/or text index.
- Controlling which paths are included in inverted index (to make the index smaller, speed up index build).
- Applying a filter to the number of matching flattened documents (per JSON document).
- Controlling how often to flush with the off-heap index creator and also limiting temporary memory usage.
- Disabling the indexing of array positions (to make the index smaller).
Creating both the JSON Index and the Composite JSON Index on the same column is not recommended. When evaluating the JSON_MATCH function, Pinot first checks for the regular JSON Index, ignoring the Composite JSON Index.
Example Illustration
Benchmark testing shows a significant response time reduction for range expressions. Consider the query:Configuration
Enabling Composite JSON Index
To enable a composite JSON index on a column in your StarTree Cloud table, add the following configuration to your table definition:Configuration Parameters
The composite JSON index configuration supports the JSON index options, as well as:- flushThreshold: Flush off-heap posting list every n documents to limit memory usage.
- enablePositionalIndexing: Indicates whether to include array indexes or not.
- Optional
- Default: true
- When set to false, then Pinot won’t include array indexes in the inverted index.
- invertedIndexConfigs: A list of paths to add to the inverted index.
- Default: empty
- Can be a single element with
"includeAllPaths":true
, to include everything
- rangeIndexConfigs: A list of paths to include in internal range index(es), along with settings.
Supports the following parameters:- path: JSON path of field to index
- Required
- name: The name for the field.
- Required and must be unique.
- dataType: Data type of the JSON field.
- Required
- When
createDictionary=true
, this parameter accepts accepts INT, LONG, FLOAT, DOUBLE, STRING, BIG_DECIMAL. - Otherwise this parameter accepts INT, LONG, FLOAT, DOUBLE.
- createDictionary: Indicates whether to create the dictionary for the field.
- Optional
- Default: false
- dictionaryType: Indicates the type, fixedLength or variableLength.
- Optional
- Default: variableLength
- Due to dictionary limitations, Pinot currently supports only variableLength for BIG_DECIMAL.
- defaultValue: A value for flattened records that don’t include the field, or that contain a badly formed value.
- Optional
- path: JSON path of field to index
- textIndexConfigs: A list of paths to include in the internal Lucene text index(es).
- Accepts path and name parameters.
- The name parameter must be unique.
Example Queries
Range-index-based range query against a JSON fieldRange Index Default Values
The range index contains value for each flattened row. If value is missing or unparseable then either user-set or fixed - ‘0’ - default is used. That default might cause unexpected results to appear when range-querying without upper or lower bound. For example, with nodefaultValue
set, the following query will include documents without a value
field:
Array Position Indexing
A query that specifies an array index in the JSON path will require theenablePositionalIndexing=true
parameter configuration, even when path is range-indexed. Otherwise, the query will return an empty result. For example, the following query:
Range Queries against Mutable Segments
A range index is used for immutable/commited segments, while mutable segments rely on the inverted index. When indexing hybrid or real-time tables, you must include paths used for range queries in the inverted index. Otherwise queries will return empty results for the mutable segments. Consider the following example configuration:Text Index Configuration
Apart from the path and name, the textIndexConfig can contain fields allowed in regular text index configuration: “rawValue”, “queryCache”, “useANDForMultiTermQueries”, “stopWordsInclude”, “stopWordsExclude”,“luceneUseCompoundFile”, “luceneMaxBufferSizeMB”, “luceneAnalyzerClass”, “luceneAnalyzerClassArgs”,
“luceneAnalyzerClassArgTypes”, “luceneQueryParserClass”, “enablePrefixSuffixMatchingInPhraseQueries”,
“reuseMutableIndex”, “luceneNRTCachingDirectoryMaxBufferSizeMB” For more information on the text index and configuration, see the Pinot documentation.