Pinot is most commonly used to provide real-time analytics based on streaming data, which can be achieved using a real-time table. However, after running these systems for a while, we’ll want to update the data ingested into this table. Perhaps the name of a value in a column has been updated, or we want to remove some duplicate records.

Segments in real-time tables can’t be replaced, but we can replace those in offline tables. Managed offline flow is the way that Pinot handles the process of moving the data from real-time to offline tables.

In this recipe we’ll learn how to use Pinot offline managed flow.

Pinot Version0.9.3
Codestartreedata/pinot-recipes/managed-offline-flow

Prerequisites

To follow the code examples in this guide, you must install Docker locally and download recipes.

Clone this repository and navigate to this recipe:

git clone git@github.com:startreedata/pinot-recipes.git
cd pinot-recipes/recipes/ingest-json-files

Makefile

make recipe

Running this recipe will build the foundation and start producing data into Kafka.

Run the next Make task:

Managed Offline Flow

make manage_offline_flow

The Make command above will perform these tasks:

  • Sets the necessary properties in the Pinot Controller to enable the managed offline flow task: RealtimeToOfflineSegmentsTask.timeoutMs and .numConcurrentTasksPerInstance.
  • Schedules the task to run.
  • Prints logs related to the task.
  • Updates the hybrid table’s time boundary so that you can see records that have been move to offline.

View realtime and offline segments

Navigate to http://localhost:9000/#/query and run the following query:

select $segmentName, count(*) cnt
from events
group by $segmentName
order by cnt desc

Run the statement above to see records migrate from REALTIME to OFFLINE by running make realtime to generate more data and make manage_offline_flow to migrate older data to OFFLINE. See the README on GitHub for this recipe for sample output.

Clean up

make clean

Troubleshooting

To clean up old Docker installations that may be interfering with your testing of this recipe, run the following command:

docker system prune