In this guide we’ll learn how to configure the segment threshold for flushing data in Apache Pinot real-time tables.
Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/configuring-segment-threshold |
Property | Description |
---|---|
realtime.segment.flush.threshold.rows | Row count flush threshold. |
realtime.segment.flush.threshold.time | Time threshold that will keep a segment open for it is flushed. |
realtime.segment.flush.threshold.segment.size | The desired size of a completed segment. |
realtime.segment.flush.threshold.rows
.
Pinot will complete/flush segments as soon as the consuming segment contains the specified number of rows.
This will generally result in each segment having the same number of rows.
However, if the time threshold defined by realtime.segment.flush.threshold.time
is reached, a segment will be completed even if the row count flush threshold has not yet been reached.
realtime.segment.flush.threshold.rows
to 0
, in which case Pinot will instead attempt to make sure that every segment has the desired size defined by
realtime.segment.flush.threshold.segment.size
.
When configuring the segment threshold this way, the minimum number of rows in a segment is 10,000.
The first segment for a new partition will have 100,000 rows. For subsequent segments Pinot will slowly adjust the number of rows to get closer to the desired segment size. This means that the first few segments might differ in size, but over time the segment size will approach the desired size.
segment.flush.threshold.size
property indicates that this segment contains 101,912 rows.
We can check how many rows are stored in all segments by running the following script:
uuid
column, but for now it looks fairly stable.