Pinot is most commonly used to provide real-time analytics based on streaming data, which can be achieved using a real-time table. However, after running these systems for a while, we’ll want to update the data ingested into this table. Perhaps the name of a value in a column has been updated, or we want to remove some duplicate records. Segments in real-time tables can’t be replaced, but we can replace those in offline tables. Managed offline flow is the way that Pinot handles the process of moving the data from real-time to offline tables. In this recipe we’ll learn how to use Pinot offline managed flow.Documentation Index
Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Pinot Version | 0.9.3 |
|---|---|
| Code | startreedata/pinot-recipes/managed-offline-flow |
Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes. Clone this repository and navigate to this recipe:Makefile
Managed Offline Flow
- Sets the necessary properties in the Pinot Controller to enable the managed offline flow task:
RealtimeToOfflineSegmentsTask.timeoutMsand.numConcurrentTasksPerInstance. - Schedules the task to run.
- Prints logs related to the task.
- Updates the hybrid table’s time boundary so that you can see records that have been move to offline.
View realtime and offline segments
Navigate to http://localhost:9000/#/query and run the following query:make realtime to generate more data and make manage_offline_flow to migrate older data to OFFLINE. See the README on GitHub for this recipe for sample output.

