1. Overview
This document describes how Parquet data cache and Pinot index cache work in case of External tables as well as regular tables configured with remote tiered storage. External tables read their data from object storage on every query. The Parquet Data Cache keeps recently-read Parquet pages on the server so subsequent queries do not pay the round-trip. Indexes for both external tables and Pinot tiered storage tables also live on object storage. The Index Cache keeps recently-read index byte ranges on the server.| Cache | What it stores | Used by | On-disk location | When is it populated |
|---|---|---|---|---|
| Parquet Data Cache | Compressed Parquet column data and dictionary pages | External tables (Iceberg/Delta Lake) | {dataDir}/remote_data_cache | On every Parquet column read (always on) |
| Index Cache | Raw byte ranges of segment index files (columns.psf, inverted_index) | External tables and Pinot tiered storage tables | {dataDir}/index/index_cache | On query opt-in |
- Parquet Data Cache populates both tiers — an in-memory prefetch tier (decompressed, decoded values on heap) sits in front of an mmap-backed disk tier.
- Index Cache populates only the disk tier. Index byte ranges are stored as raw bytes; there is no decoded form to keep in memory.


2. What’s enabled out of the box
2.1 Parquet Data Cache — always on
This cache is always on and does not need to be explicitly enabled. The cache is created at server startup and serves every read against a remote-Parquet column.| Tier | Default size | Backing store |
|---|---|---|
| Prefetch (in-memory) | 1 GB | Java heap (deserialized values) |
| Disk | 30% of total disk | mmap’d files at {dataDir}/remote_data_cache |
2.2 Index Cache — opt-in per query
The Index Cache is populated only when a query is preceded by| Tier | Default size | Backing store |
|---|---|---|
| Disk | 30% of total disk | mmap’d files at {dataDir}/index_cache |
3. Advanced operations: Enable, disable, skip
3.1 Parquet Data Cache
Always on at the server level and cannot be globally disabled. However, you can use a per-query bypass as shown below:3.2 Index Cache
In this case, there is no global enable flag; enablement is a per-query decision. You can enable index cache for a specified query as shown below:| Action | How |
|---|---|
| Enable for one query | SET enable.prefetch.page.cache = 'true' |
4. Behavior on restart
The on-disk tier is wiped on server startup by default. To keep cached data across restarts, use the following server JVM option:5. Eviction policies
| Tier | Cache | Policy | Defaults |
|---|---|---|---|
| Prefetch (in-memory) | Parquet Data Cache only | Circular buffer on heap; oldest slot is overwritten on insert | 1 GB capacity, no TTL |
| Disk | Both | LRU over fixed-size mmap fragment files. Time based eviction coming soon. | Parquet Data Cache: 32 MB fragments. Index Cache: 512 MB fragments. Background eviction sweep every 60s; immediate non-blocking pass when fragments are 90% full. |
6. Clearing caches
HTTP endpoints
There are certain scenarios in which user may want to clear the caches. StarTree provides Cluster-wide APIs for this (controller fans out to every server):| Endpoint | Effect |
|---|---|
DELETE /pageCache?consumerName=PARQUET_INDEX | Clear Parquet Data Cache (all tiers) |
DELETE /pageCache?consumerName=SEGMENT_INDEX | Clear Index Cache (all tiers) |
DELETE /pageCache | Clear both |

