Geospatial Index
The Geospatial index enhances spatial query performance by leveraging Uber’s H3 hexagonal grid system to efficiently execute location-based queries like radius searches and spatial joins.
Overview and Purpose
A geospatial index is a specialized data structure that optimizes queries involving geographic locations. It dramatically accelerates spatial operations such as finding locations within a certain distance, determining containment relationships, or identifying intersections between geographic areas.
In StarTree Cloud (powered by Apache Pinot), geospatial indexing is based on Uber’s H3 library, which uses hexagonal hierarchical gridding. This approach divides the Earth’s surface into hexagons at various resolutions, enabling fast and efficient spatial queries.
Geospatial indexes are particularly valuable for:
- Finding points within a specific distance (radius search)
- Calculating distances between locations efficiently
- Identifying containment relationships (e.g., points within polygons)
- Any query pattern involving the
ST_Distance
function in WHERE clauses
Geospatial functions in general are computationally expensive. Using a geospatial index can significantly improve query performance by reducing the number of exact calculations needed.
How the Index Works
Core Concepts
Traditional querying of geospatial data requires expensive calculations for each record to determine spatial relationships. For example, finding all points within 5km of a location would require calculating the distance for every single point in the dataset.
The H3 geospatial index in StarTree Cloud accelerates these operations by implementing:
- Hierarchical Hexagonal Gridding: The Earth’s surface is divided into hexagonal cells at different resolutions (precision levels).
- Location Mapping: Every geospatial point is mapped to the hexagon that contains it, represented as an H3 index.
- Distance Approximation: Instead of calculating exact distances between all points, the system can first filter by hexagon proximity.
The index works in a two-step process:
- Use the H3 index to quickly identify records in nearby hexagons (coarse filtering)
- Apply precise geospatial functions only to this smaller filtered set for exact results
Example Illustration
When searching for locations within 5km of a point:
- The central point is converted to its corresponding H3 hexagon
- The system identifies all hexagons within the required distance (forming rings around the central hexagon)
- Records inside hexagons completely within the search radius are automatically included
- Records in hexagons at the boundary are precisely filtered using the actual
ST_Distance
function
This approach significantly reduces the computational cost of geospatial queries by limiting exact distance calculations to only the relevant subset of data.
Types of Geospatial Data
StarTree Cloud supports two types of geospatial data:
Geometry Type
- Represents spatial data on a flat, Cartesian plane
- Coordinates are treated as X/Y values on a 2D surface
- Distance measurements are in the same units as the coordinates (often degrees for geographic data)
- Suitable for smaller areas where Earth’s curvature is negligible
Geography Type (Spherical Geography)
- Represents spatial data on a spherical surface (like Earth)
- Accounts for the curvature of the Earth in calculations
- Distance measurements are in meters
- Provides more accurate results for large distances or areas near the poles
Configuration
Enabling Geospatial Index
To enable a geospatial H3 index on a location column, follow these steps:
- Define your geospatial column as BYTES type in the schema:
- Configure the H3 index in your table configuration:
Alternative Configuration (Legacy Method)
You can also use the older configuration format:
Important Configuration Considerations
- Dictionary Encoding: You must disable dictionary encoding on geospatial indexed columns by setting the column’s
encodingType
toRAW
. - Resolutions: The
resolutions
parameter defines the precision levels of the H3 grid:- Higher values (9-15): Finer precision, smaller hexagons (meters to centimeters)
- Lower values (0-8): Coarser precision, larger hexagons (kilometers to thousands of kilometers)
- Multiple resolutions allow for multi-level filtering
- Transform Function: Use a transform function to convert longitude/latitude pairs into the appropriate geometry or geography type.