How does deep store segment sync work?
When the alter table task is used in thereloadOnly
mode.
- The process fetches stale segments on the servers for the table.
- Generates a minion task that rebuilds the segment with the latest table configuration and schema. Then uploads the segment to the deep store with the same name.
- This triggers a segment refresh message. After that, all servers hosting the segment will download the segment and load it locally.
- After the reload through the alter table task, the segment metadata will include a table hash value. This value is used by subsequent task generations to skip these segments.
Configuration
Add the following configuration parameters to the alter table task:Experimental feature, use with caution.
reloadOnly
Set this flag to true to use the alter table task for deep store sync. General recommendations
- Make sure there is no concurrent server reload operation as that would lead to ATT skipping those segments from reload on deep store
- Set skipSegmentPreprocess on the table so that segments are not reloaded on server restart
- Run this task periodically to reduce the probability of server side reload through user triggers.
forceReload
This parameter uses the table hash value added by the alter table task in the reloaded segments. The process does not check for segment metadata on servers. Use this flag for the first iteration of deep store sync to bootstrap all the historical segments and to add the table hash. This ensures that all the existing segments are in sync with the table configuration. After the first iteration, this parameter can be removed.
Scenarios
The table shows the state of deep store sync and the effect of the configuration parameters for different scenarios:Scenario | Action by the Alter Table Task on the Segment | ||||
---|---|---|---|---|---|
Case | Deep Store | Local Server | reloadOnly | reloadOnly withforceReload | |
Existing segment before task introduction | Table not updated since segment creation | (In Sync) | (Not Refreshed) | (Refreshed Once) | |
Table updated and segment reloaded on server | (Not In Sync) | ||||
Table updated but segment not reloaded on server | |||||
Segment generated post task introduction | Table not updated or segment is already processed by task | ||||
Table updated and segment reloaded on server | |||||
Table updated and task is yet to run |
Limitations
- In the background, the servers still load segments but the loading process is faster.
- The initial run will involve a process overhead to reload all segments. Consecutive runs will be incremental.
- Server-side reloads are not automatically blocked. Maintain a frequent cron on the task to make sure segments are reloaded by the alter table task and to reduce the probability of server side reload through user triggers.