> ## Documentation Index
> Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Offline Upserts

## Introduction

Offline upserts allow an OFFLINE Pinot table to upsert records by primary key across multiple segments, presenting query results as if only the most-recent version of each record exists. In a standard OFFLINE table every segment is independent — pushing two segments that each contain a row for the same primary key means queries will return both rows. Offline upserts fix this: when a new segment is pushed, the server resolves conflicts with all previously loaded segments and surfaces only the winning record per primary key.

### Record precedence ordering

When no explicit comparison column is configured, **segment push time** determines which record wins: the record from the segment pushed most recently always wins. This is the simplest setup for offline tables where each push represents a more-complete or corrected snapshot of the data.

You can also configure an explicit comparison column (e.g., an `eventTime` timestamp). In that case the row with the highest value in the comparison column wins, regardless of push order.

***

## Use Cases

| Scenario                      | Why offline upserts help                                                                                                                                                                              |
| ----------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Incremental corrections       | A small fraction of rows in the table has changed (e.g., late-arriving corrections, status updates) and you want to push only the changed rows as a new segment without re-ingesting the whole table. |
| Snapshot uploads with overlap | An upstream job exports daily snapshots that may overlap (the same primary key appears in yesterday's and today's export). Offline upserts deduplicate automatically.                                 |

Offline upserts are a good fit when the **number of rows being updated is a small-to-moderate fraction of the total table size** (roughly less than \~40%). For larger fractions — where you are effectively replacing the whole table — consider Atomic Sync (see below).

***

## How to Configure Offline Upserts

Enabling offline upserts requires changes to three places: the schema, the table config, and (optionally) the ingestion task config.

### 1. Schema — declare primary key columns

```json theme={null}
{
  "schemaName": "my_table",
  "primaryKeyColumns": ["userId"],
  "dimensionFieldSpecs": [
    { "name": "userId",  "dataType": "INT" },
    { "name": "status",  "dataType": "STRING" }
  ],
  "metricFieldSpecs": [
    { "name": "score", "dataType": "FLOAT" }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "eventTime",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    }
  ]
}
```

`primaryKeyColumns` can contain multiple columns — Pinot hashes them together as a composite key.

### 2. Table config

Three sections of the table config must be set together: `upsertConfig`, `segmentPartitionConfig`, and `routing`.

```json theme={null}
{
  "tableName": "my_table",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "timeColumnName": "eventTime"
  },
  "upsertConfig": {
    "mode": "FULL",
    "metadataManagerClass": "ai.startree.pinot.upsert.rocksdb.RocksDBTableUpsertMetadataManager"
  },
  "indexingConfig": {
    "segmentPartitionConfig": {
      "columnPartitionMap": {
        "userId": {
          "functionName": "Murmur",
          "numPartitions": 4
        }
      }
    }
  },
  "routing": {
    "instanceSelectorType": "strictReplicaGroup"
  }
}
```

**Key points:**

* `upsertConfig.mode` must be `"FULL"`. Partial upsert mode is not supported for offline tables (see [Limitations](#limitations)).
* `upsertConfig.metadataManagerClass` must point to the StarTree RocksDB implementation. If omitted the in-heap OSS default is used, which does not persist metadata across server restarts and is not recommended for production.
* `segmentPartitionConfig` on the **primary key column** is recommended. The partition function and count define how the data is partitioned and also governs the overall scalability. Every segment for a given primary key must land on the same server — without this the server cannot see all versions of a key and deduplication will be incorrect. In general, the recommendation is to use a high partition count (eg: 128) to account for organic growth. Note that you cannot change this post table creation.
* `instanceSelectorType: strictReplicaGroup` ensures that a query is routed to exactly one replica group. This is required so that the server's local RocksDB view (which is per-partition) is authoritative for the keys it owns.

**Optional: explicit comparison column**

```json theme={null}
"upsertConfig": {
  "mode": "FULL",
  "metadataManagerClass": "ai.startree.pinot.upsert.rocksdb.RocksDBTableUpsertMetadataManager",
  "comparisonColumns": ["eventTime"]
}
```

When `comparisonColumns` is set, the row with the highest value in that column wins. When it is omitted, the row from the most-recently-pushed segment wins.

### 3. Partitioning upstream data with FileIngestionTask

For offline upserts to work correctly, **every segment must be partitioned on the primary key**: all rows with the same primary key value must reside in the same server.

`FileIngestionTask` handles repartitioning automatically, even when the upstream S3 data is not pre-partitioned by primary key. On each execution the task executor reads `segmentPartitionConfig` from the table config and passes it to `SegmentProcessorFramework`, which physically sorts all ingested rows into per-partition buckets by applying the configured partition function (e.g., Murmur) across the configured number of partitions. Each output segment contains only rows that hash to a single partition bucket, satisfying the upsert requirement without any extra config.

The `segmentPartitionConfig` in the table config acts as the contract: the task executor reads it to know how many partitions to create and which hash function to use, producing pre-partitioned segments that satisfy the upsert requirement automatically.

***

## Offline Upserts vs. Atomic Sync

StarTree also offers **Atomic Sync** (via `consistentPushSwapEnabled=true` on `FileIngestionTask`), which replaces the entire table atomically. Both features update an offline table from S3, but they are designed for different scenarios.

| Dimension                         | Offline Upserts                                                                                                                                                        | Atomic Sync                                                                                                                    |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| **What it does**                  | Merges new/updated rows into existing segments. Only changed rows need to be pushed.                                                                                   | Rebuilds and atomically swaps **all** segments in the table from the input source.                                             |
| **Granularity**                   | Row-level upsert/deduplication by primary key.                                                                                                                         | File / Partition / Table-level replacement.                                                                                    |
| **Data volume per push**          | Small-to-moderate fraction of the table (works well under \~40% of total rows being upserted).                                                                         | Optimised for full partition or table refreshes (essentially 100% of rows).                                                    |
| **Query consistency during push** | New segments become visible as they are pushed; there is a brief window where both old and new versions of a row are in the table and the upsert filter resolves them. | Old and new sets of segments are swapped atomically — queries see either the old or the new version of the table, never a mix. |
| **Index support**                 | No StarTree index support (see [Limitations](#limitations)).                                                                                                           | StarTree indexes are supported which significantly improves query performance                                                  |
| **Partial upsert**                | Not supported (FULL mode only).                                                                                                                                        | Not applicable — full re-ingestion replaces all data.                                                                          |
| **When to choose**                | You are pushing incremental corrections, late data, or a small update batch. The primary key is known and stable.                                                      | You are regenerating the entire table (e.g., reprocessing pipeline, full restatement), or you require atomic cutover.          |

**Rule of thumb:** if less than \~40% of the table's rows are changing in a given push cycle, offline upserts are the better fit — they avoid re-ingesting unchanged data and are cheaper to operate. Above \~40%, or when you need atomic visibility guarantees, use Atomic Sync.

***

## Limitations

**No partial upsert support.** Only `mode: FULL` is supported for offline tables. FULL mode means the entire winning row replaces the entire losing row. Column-level merge strategies available in PARTIAL mode for realtime upsert tables are not available. Note that this is generally ok since column level update pattern is typically found in CDC/streaming use cases not batch.

**No StarTree index.** Columns in an offline upsert table cannot have a StarTree (multi-dimensional pre-aggregation) index configured. The controller will reject such a table config at creation time.

**Segment push time ordering is implicit and irreversible.** When `comparisonColumns` is not set, the server uses segment push time to break ties. Pushing an older segment after a newer one will cause the older data to win. Use an explicit `comparisonColumns` (e.g., a business event timestamp) if push order cannot be guaranteed.