SegmentRefreshTask reads the latest table config and refreshes the segments if they are not consistent with the table config. When an inconsistency between segments and table config is detected, it will download the segments from the deep store, process and regenerate the segments, and then push them back to replace the old segments atomically.
complexTypeConfig configuration is used as a part of the ingestionConfig to flatten and process complex-type data. Do not use the Segment Refresh Task when the complexTypeConfig configuration is enabled. This may result in records being duplicated and unintended changes to tables.taskConfig section in the table configuration.
| Property Name | Required | Description |
|---|---|---|
| bucketTimePeriod | Yes | Time bucket for segments (e.g. 1d). |
| maxNumRecordsPerSegment | No (default 5M) | Max (desired) number of records in each segment. The task will try to resize all segments to this size after applying the partitioning constraints. |
| skipSegmentIndexCheck | No (default false) | If set to true, the index check (see the next section) will be skipped. This check requires pulling all segments’ metadata from the servers, which can be costly for large table. |
| tableMaxNumTasks | No (default 10) | Max number of parallel tasks a table can run at each schedule. This value can be tuned based on the Minion instances in the cluster. It has to be positive. |
| maxNumRecordsPerTask | No (default 50M) | Max number of records processed in a single task. Each task is executed by a single Minion instance, so the records processed should be limited to prevent the Minion from running out of resources. It has to be a positive value. |
| maxDataSizePerTask | No (default 5 GB) | Max size of data provided to a single task. |
| desiredSegmentSize | No (default 500 MB) | User specified size for a segment. |
| batchSegmentUpload | No | Boolean field for which the default value is false. When the value is set to true segments are uploaded in batch mode which is faster than uploading segments one after the other. |
| mergeType | No | Same definition as in the MergeRollupTask. |
| roundBucketTimePeriod | No | Same definition as in the MergeRollupTask. |
| *.aggregationType | No | Same definition as in the MergeRollupTask. |
"maxNumRecordsPerSegment": "2000000" . This setting configures how large the output segment should be, and the input segments are automatically decided for each task, based on how many valid records they have.
In addition to adding the SegmentRefreshTask configurations to enable the minion task for the upsert table, the “rocksdb.segmentmerge.enable” setting must also be specified under the upsertConfig section of the table configuration, as shown below. This flag ensures that during upsert, the system uses the ‘refreshed_’ segment in case of identical seqId values—favoring it over older segments based on creation time.
Note that a server restart is required for this configuration to take effect. The flag is currently set to false by default to allow incremental adoption of the feature. It will be enabled (true) by default in future releases.