Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/merge-small-segments-realtime |
You can also merge segments in offline tables. For more information see the merge small segments in offline tables guide
Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes.Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:Controller configuration
The Pinot controller is launched with the following custom configuration:We’ve configured the task scheduler to run every 5 minutes, but you’d set that to an hour or more in a production system.
Dataset
We’ve got the following data generator that generates data in bursts depending on the minute of the hour:Pinot Schema and Table
Now let’s create a Pinot Schema and Table. First, the schema:batchIngestionConfig
:
This shouldn’t be strictly necessary because it’s a config that’s usually used for offline tables, but the merge/roll-up logic in the current version (0.12.1) relies on it being there. The reliance on this config existing has already been fixed in the main branch and therefore this config won’t be required when 0.13.0 is released.
MergeRollupTask
, which is extracted below:
We are intentionally using very small values for the
bucketTimePeriod
and bufferTimePeriod
for the purposes of this example. You’ll want to use larger values for production systems.-arm64
suffix if you’re not using a Mac M1/M2.
Viewing segments
We can navigate to the Pinot UI and run the following query to see the segments that have been created and the number of records that they contain:segmentName | count(*) | minDate | maxDate |
---|---|---|---|
events__0__6__20230331T1122Z | 15137 | 2023-03-31 11:21:58 | 2023-03-31 11:22:14 |
events__0__5__20230331T1120Z | 778623 | 2023-03-31 11:20:54 | 2023-03-31 11:21:58 |
events__0__4__20230331T1119Z | 110868 | 2023-03-31 11:19:51 | 2023-03-31 11:20:54 |
events__0__3__20230331T1118Z | 691993 | 2023-03-31 11:18:46 | 2023-03-31 11:19:51 |
events__0__2__20230331T1117Z | 182076 | 2023-03-31 11:17:46 | 2023-03-31 11:18:46 |
events__0__1__20230331T1116Z | 602990 | 2023-03-31 11:16:40 | 2023-03-31 11:17:46 |
events__0__0__20230331T1115Z | 118298 | 2023-03-31 11:15:51 | 2023-03-31 11:16:40 |
Merge segments
The job to merge segments runs every 5 minutes, so if we wait a little while it will eventually start.If you want to manually trigger the merge segments job, see the merge segments section of the merge small segments guide.
Output
And we can check the Pinot Minion logs to see if the job has run:
Output
Let’s now run the segments query again:
segmentName | count(*) | minDate | maxDate |
---|---|---|---|
events__0__8__20230331T1124Z | 109890 | 2023-03-31 11:23:59 | 2023-03-31 11:25:08 |
events__0__7__20230331T1123Z | 743572 | 2023-03-31 11:23:04 | 2023-03-31 11:23:59 |
events__0__6__20230331T1122Z | 72745 | 2023-03-31 11:21:58 | 2023-03-31 11:23:04 |
events__0__5__20230331T1120Z | 778623 | 2023-03-31 11:20:54 | 2023-03-31 11:21:58 |
merged_5m_2m_1680261855683_0 _events_1680261600000_1680261654662_1 | 4567 | 2023-03-31 11:20:00 | 2023-03-31 11:20:54 |
merged_5m_2m_1680261855683_0 _events_1680261351086_1680261599999_0 | 1701658 | 2023-03-31 11:15:51 | 2023-03-31 11:19:59 |
segmentName | count(*) | minDate | maxDate |
---|---|---|---|
events__0__18__20230331T1134Z | 124044 | 2023-03-31 11:34:34 | 2023-03-31 11:35:09 |
events__0__17__20230331T1133Z | 343966 | 2023-03-31 11:33:33 | 2023-03-31 11:34:34 |
events__0__16__20230331T1132Z | 443483 | 2023-03-31 11:32:26 | 2023-03-31 11:33:33 |
events__0__15__20230331T1131Z | 439137 | 2023-03-31 11:31:26 | 2023-03-31 11:32:26 |
events__0__14__20230331T1130Z | 355319 | 2023-03-31 11:30:19 | 2023-03-31 11:31:26 |
merged_5m_2m_1680262455984_0 _events_1680262200000_1680262219170_1 | 1591 | 2023-03-31 11:30:00 | 2023-03-31 11:30:19 |
merged_5m_2m_1680262455984_0 _events_1680261900010_1680262199999_0 | 2368619 | 2023-03-31 11:25:00 | 2023-03-31 11:29:59 |
merged_5m_2m_1680262155835_0 _events_1680261600000_1680261899994_0 | 1604525 | 2023-03-31 11:20:00 | 2023-03-31 11:24:59 |
merged_5m_2m_1680261855683_0 _events_1680261351086_1680261599999_0 | 1701658 | 2023-03-31 11:15:51 | 2023-03-31 11:19:59 |