GROUP BY timeBucket
. Let’s consider a granularity of 1 day. What happens if there is no data on a Sunday? There will be no line for this Sunday. There is a great chance this will make things break later. After all, you expect your detection pipeline to know when there is no data. This case can seem non-likely to happen, but as soon as you filter on specific dimensions, the probability of this happening skyrockets.
To manage missing data, use a TimeIndexFiller
node. A usual pipeline will look like this:
TimeIndexFiller
takes an input and returns it with the time index filled.
DataFetcher
uses the ThirdEye macros __timeGroup(...)
and __timeFilter(...)
, metadata about the granularity and the time predicate is directly given to the TimeIndexFiller
.
TimeIndexFiller
requires the following parameters:
component.monitoringGranularity
: the granularity in ISO 8601 format. Eg: P1D
.component.metric
: the name of the metric column.component.timestamp
: the name of the time column.component.minTimeInference
: the strategy to infer the minimum time.component.maxTimeInference
: the strategy to infer the maximum time.component.lookback
: Used when time inference uses a lookback time. In ISO 8601 format.FROM_DATA
: the minimum (resp maximum) time is the minimum (resp maximum) time observed in the input. Does not work well if data is missing at the beginning or at the end.FROM_DETECTION_TIME
: the minimum (resp maximum) time is the the minimum (resp maximum) of the analysis timeframeFROM_DETECTION_TIME_WITH_LOOKBACK
: same as the previous one, with an offset applied of value component.lookback
.