RealtimeToOfflineSegmentsTask
.
RealtimeToOfflineSegmentsTask
, both real-time and offline tables should be created.
But, where the RealtimeToOfflineSegmentsTask
is configured on the real-time table, the SegmentImportTask is configured on the OFFLINE table, following the convention that minion tasks to ingest data are configured on the destination tables.
Property Name | Required | Description |
---|---|---|
tableMaxNumTasks | No | The max number of parallel tasks a table can run at any time. It’s 10,000 by default. |
maxNumRecordsPerTask | No | The max number of records one task can process, to spread workload among parallel tasks. It’s 50M by default. |
initialWatermarkMs | No | Where to start the task and by default starting from the smallest start time of real-time segments. It’s -1 by default. |
desiredSegmentSize | No | The segment size desired (Default is 500M. K for kilobyte, M megabyte, G for gigabyte). |
schedule | No | CRON per Quartz cron syntax for when the job will be routinely triggered. If not set, the task is not cron scheduled but can still be triggered via endpoint /tasks/schedule. |
RealtimeToOfflineSegmentsTask
RealtimeToOfflineSegmentsTask
and set initialWatermarkMs
to that value before starting the SegmentImportTask.
RealtimeToOfflineSegmentsTask
task config from the REALTIME table and wait for existing tasks to finish./tasks/RealtimeToOfflineSegmentsTask/{tableNameWithType}/metadata
to get current watermark.SegmentImportTask
task configs in offline table, and set initialWatermarkMs
to what we get in the last step.RealtimeToOfflineSegmentsTask
has stopped and SegmentImportTask
started.