Context

Having text index support for data stored in remote tiered storage is crucial for accelerating performance for text search based queries. Previously, text indexes were not supported in StarTree Tiered storage due to various reasons (text index stored separate from Pinot segment and not integrated in columns.psf) Starting Startree release 0.12.0, text indexes can now be used in conjunction with Startree Tiered storage configuration. Key changes made to enable this support
  • File Consolidation: Text index directories are now consolidated into the columns.psf file.
  • Buffer Integration: LuceneTextIndexReader operates directly on PinotBuffer, utilizing the consolidated columns.psf file.
  • Tiered Storage Enabled: The inclusion of text indexes within columns.psf automatically enables tiered storage support.
Operational Mechanism
  • Default Mode: Text indexes continue to employ separate directories, thereby lacking tiered storage support.
  • Consolidated Mode: When storeInSegmentFile: "true" is configured, text indexes are stored within columns.psf, enabling tiered storage support.
Backward Compatibility: Existing segments containing text indexes in separate directories remain fully functional without alteration.

Sample configuration

{
  "fieldConfigList": [
    {
      "name": "your_text_column",
      "indexes": {
        "text": {
		  ...
          "storeInSegmentFile": "true"
        }
      },
	  ...
With this, users can leverage text index with tiered storage enabled tables.

Tier Overwrites for Existing Tables

Its also possible to override the storeInSegmentFile config per tier. For instance:
{
  "fieldConfigList": [
    {
      "name": "your_text_column",
      "indexes": {
        "text": {
          "deriveNumDocsPerChunk": false,
          "rawIndexWriterVersion": 4,
          "storeInSegmentFile": "false"
        }
      },
      "properties": {
        "luceneUseCompoundFile": "false",
        "luceneMaxBufferSizeMB": "128",
        "useANDForMultiTermQueries": "true",
        "luceneAnalyzerClass": "org.apache.lucene.analysis.standard.StandardAnalyzer",
        "luceneQueryParserClass": "org.apache.lucene.queryparser.classic.QueryParser"
      },
      "tierOverwrites": {
        "myS3Tier": {
          "indexes": {
            "text": {
              "deriveNumDocsPerChunk": false,
              "rawIndexWriterVersion": 4,
              "storeInSegmentFile": "true"
            }
          }
        }
      }
    }
  ]
}