Timestamp Index
Overview and Purpose
A timestamp index is a specialized indexing solution that accelerates time-based queries with different time granularities. It’s designed to optimize analytics workflows that involve filtering and grouping by common time dimensions like day, month, or year.
The TIMESTAMP data type in StarTree Cloud stores values as millisecond-precision epoch timestamps (long values). While this high precision is valuable for exact timestamps, it’s often excessive for analytical queries where users typically want to analyze data at coarser time granularities.
Timestamp indexes are particularly valuable for:
- Range queries on time columns (e.g., queries for specific date ranges)
- Group-by operations with different time granularities (day, week, month, year, etc.)
- Time-series analytics across large datasets
- Any query pattern that involves time transformations like
dateTrunc()
ordateTimeConvert()
Without a timestamp index, time-based queries require extracting values at query time, applying transformation functions, and then performing filters or group-by operations, which can be computationally expensive for large datasets.
How the Index Works
Core Concepts
Traditional querying of timestamp data requires time value conversions and transformations to be computed at query time. For example, to group data by month, the system would need to convert each timestamp to its corresponding month for every record in the dataset.
The timestamp index in StarTree Cloud optimizes these operations by:
- Pre-computing Time Granularities: During segment generation, the system pre-computes additional derived columns for each configured time granularity.
- Derived Column Generation: For each time granularity (e.g., DAY, MONTH, YEAR), a separate derived column is created following the naming pattern
$${ts_column_name}$${ts_granularity}
. - Automatic Range Indexing: Range indexes are automatically built for all granularity columns, enabling efficient filtering on time ranges.
- Query Rewriting: At query time, functions like
dateTrunc('DAY', timestamp_col)
are automatically rewritten to use the pre-computed columns, dramatically reducing computation overhead.
Example Illustration
For a timestamp column named event_time
with configured granularities DAY, MONTH, and YEAR:
- Three additional derived columns are generated:
$event_time$DAY
- containing day-level timestamps$event_time$MONTH
- containing month-level timestamps$event_time$YEAR
- containing year-level timestamps
- When a query includes
dateTrunc('MONTH', event_time)
, it’s automatically rewritten to use the$event_time$MONTH
column instead of computing this transformation during query execution. - Similarly, a filter condition like
dateTrunc('YEAR', event_time) = '2022'
would use the$event_time$YEAR
column’s range index for efficient filtering.
Configuration
Enabling Timestamp Index
To enable a timestamp index on a column in your StarTree Cloud table, add the following configuration to your table definition:
Supported Granularities
The timestamp index supports the following granularities:
MILLISECOND
SECOND
MINUTE
HOUR
DAY
WEEK
MONTH
QUARTER
YEAR
Important Configuration Considerations
- Data Type Requirement: Timestamp indexes can only be created on columns with the
TIMESTAMP
data type. - Storage Implications: Each configured granularity creates an additional derived column, which increases storage requirements. Choose granularities based on your common query patterns.
- Index Generation Timing: Timestamp indexes are created during segment generation, not at query time.
- Performance Impact: While timestamp indexes accelerate query performance, configuring too many granularities may impact segment generation time and storage requirements.