Schema Evolution

Apache Pinot provides great support for evolving a schema after the table has been created. This is frequently needed as business requirements evolve, and data formats or structures need to change. At a high level schema evolution works as follows:

Create the new schema based on the existing schema.
Update the schema using the Controller API
Invoker the Table Reload operation to ensure the new schema is picked up by all Pinot segments and hence in the query results.

Details of each of these steps can be found in the Apache Pinot docs: Schema Evolution Segment reload is the primary mechanism that enables schema evolution in Pinot by applying new schema changes to existing segments. When you update a schema with backwards compatible changes, the segments need to be reloaded to incorporate these changes into the queryable data.

Schema Update and Reload Integration

When a schema is updated, Pinot provides two options for applying the changes to existing segments

Immediate reload (reload=true) - Triggers reloadAllSegments() for all tables using the schema
Deferred reload - default (reload=false) - Sends schema refresh messages to servers without immediate reload

By default, Pinot does not do a full reload operation but instead only instructs the servers to refresh the schema.

Backwards compatible changes

Following changes are considered backwards compatible:

Adding new columns - You can add new fields to the schema
Changing default values - Modifying default values for existing columns is allowed
Time column granularity changes - Modifications to time field granularity specifications are permitted
DateTime column format changes - Changes to datetime field formats are backwards compatible

Incompatible Changes

Generally speaking, following changes are deemed as incompatible:

Removing existing columns
Changing column data types

In StarTree Cloud, you can still make such incompatible changes by running the SegmentRefreshTask (SRT). Note that queries may not succeed until the SRT task has finished.

Query Semantics

It is important to ensure that queries don’t fail in the midst of a schema evolution operation. In order to make this easier, Pinot supports the notion of a virtual column which is used in lieu of the actual newly added column. Queries touching this virtual column will pick up the default values instead of failing (which was the old behavior). Here is the general guidance:

Scenario	Before	After
Rolling addition of a new column (newCol)	Queries fail with schema mismatch	Queries succeed; newCol is exposed as a virtual column
Add a new column without segment reload	Empty results due to server‑side pruning	Queries succeed and return all columns, with newCol filled with default values
Queries over existing columns only	No change	No change
Queries referencing a partially loaded column	Merge errors or empty responses	Queries succeed with consistent results, including the new virtual column

Full details of this new functionality can be found in this PR.

Data Backfill

Once a new column is added, often times we need to populate it with real values - typically from a batch data source like data lake. StarTree cloud provides a convenient way to accomplish this using the SegmentBackfillTask. For more details, please refer to: https://docs.startree.ai/corecapabilities/manage-data/segment-backfill-task

Get Started

Ingestion

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

Schema Update and Reload Integration

Backwards compatible changes

Incompatible Changes

Query Semantics

Data Backfill

Get Started

Ingestion

Query Data

Manage Data

Visualize Data

Manage Security

Release Notes

Reference

​Schema Update and Reload Integration

​Backwards compatible changes

​Incompatible Changes

​Query Semantics

​Data Backfill

Schema Update and Reload Integration

Backwards compatible changes

Incompatible Changes

Query Semantics

Data Backfill