Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/configuring-segment-threshold |
To learn what the segment threshold is and why it’s important, see Segment Threshold.
Configuration parameters
The segment threshold is configured using the following parameters:Property | Description |
---|---|
realtime.segment.flush.threshold.rows | Row count flush threshold. |
realtime.segment.flush.threshold.time | Time threshold that will keep a segment open for it is flushed. |
realtime.segment.flush.threshold.segment.size | The desired size of a completed segment. |
Desired row threshold
We can define a manual row threshold by specifying a value forrealtime.segment.flush.threshold.rows
.
Pinot will complete/flush segments as soon as the consuming segment contains the specified number of rows.
This will generally result in each segment having the same number of rows.
However, if the time threshold defined by realtime.segment.flush.threshold.time
is reached, a segment will be completed even if the row count flush threshold has not yet been reached.
If realtime.segment.flush.threshold.rows is set to a value greater than 0, realtime.segment.flush.threshold.segment.size is ignored.
Desired segment size
Alternatively we can setrealtime.segment.flush.threshold.rows
to 0
, in which case Pinot will instead attempt to make sure that every segment has the desired size defined by
realtime.segment.flush.threshold.segment.size
.
When configuring the segment threshold this way, the minimum number of rows in a segment is 10,000.
The first segment for a new partition will have 100,000 rows. For subsequent segments Pinot will slowly adjust the number of rows to get closer to the desired segment size. This means that the first few segments might differ in size, but over time the segment size will approach the desired size.
The algorithm used in this approach is described in more detail in the Auto-tuning Pinot real-time consumption blog post.
A worked example
Let’s see how this works with a worked example.Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes.Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:Pinot Schema and Table
Let’s create a Pinot Schema and Table. The schema is defined below:Ingesting Data
Next, we’re going to ingest some data into Kafka:Inspecting segment sizes
After the ingestion script has run for a while we can inspect the size of the segments that have been completed. You can return the size of each completed segment by running the following command:segment.flush.threshold.size
property indicates that this segment contains 101,912 rows.
We can check how many rows are stored in all segments by running the following script:
uuid
column, but for now it looks fairly stable.