Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/geospatial-indexing |
jq
to structure the data in the key:payload
structure that Kafka expects:
point
has a data type of BYTES
. Geospatial columns must use the BYTES
type because Pinot will serialize the Geospatial objects into bytes for storage purposes.
Now for the table config:
point
column. If you scroll down a bit, under ingestionConfig.transformConfigs
you can see transformation functions that converts the WKT string from our Kafka streams into a Geospatial object.
We’ll create the table by running the following:
WHERE
clause of a query.
ST_Distance
ST_Distance
function computes the distance in meters between two sets of coordinates.
When used with this function, one of the arguments must be an identifer (i.e. column name) and the other a literal value. The result of calling ST_Distance
must then be used in a range query i.e. it should be less than or greater than a specified distance.
Examples are shown below:
ST_Within
ST_Within
checks the containment of two geospatial objects. It returns true if and only if the first geometry is completely inside the second geometry.
When used with this function, the first argument must be an identifier (i.e. column name) and the other a literal value. An example is shown below:
ST_Contains
ST_Contains
also checks the containment of two geospatial objects. More specifically, it returns true if and only if no points of the second geometry lie in the exterior of the first geometry, and at least one point of the interior of the first geometry lies in the interior of the second geometry.
When used with this function, the first argument must be a literal and the other an identifier (i.e. column name). An example is shown below:
count(*) |
---|
747 |
ST_Contains
like this:
uuid | stastext(point) | ts | distance |
---|---|---|---|
1a9a7314-7550-4327-9fb5-911f636b869c | POINT (-77.729246 39.255027) | 2023-03-06 14:27:59.218 | 46146.138375698916 |
f6d3260b-93d7-478c-b7b1-2dfc214a09dc | POINT (-77.796573 39.060015) | 2023-03-06 14:40:17.591 | 37640.39151993725 |
51693e80-cdd3-48d6-af53-78156357a5dc | POINT (-77.751046 39.115125) | 2023-03-06 14:40:17.591 | 37000.762022801944 |
fafe2a9c-ca39-4dd2-bf55-37013a1e2719 | POINT (-77.569928 39.292096) | 2023-03-06 14:04:46.509 | 42825.54468861071 |
d5329e92-edf7-4733-816e-9f2e2c580c2d | POINT (-77.188892 39.187191) | 2023-03-06 14:04:46.509 | 33399.262939756285 |
9b7cba1f-a2e4-4690-bba9-48b71f50b915 | POINT (-77.053764 38.896884) | 2023-03-06 14:04:46.509 | 29635.04964656463 |
1b5d5a0e-1a6b-46fb-85b4-42e5c0e208e5 | POINT (-77.618947 38.881897) | 2023-03-06 13:38:18.331 | 20311.553153107612 |
1639f1eb-fb60-4f94-9c59-d7e595c8f416 | POINT (-76.960903 38.772332) | 2023-03-06 14:12:30.667 | 41445.58284663461 |
EXPLAIN PLAN FOR
, which will return the query plan.
To get the query plan for the ST_Distance
function, we’d write the following:
Operator | Operator_Id | Parent_Id |
---|---|---|
BROKER_REDUCE(limit:10) | 1 | 0 |
COMBINE_SELECT | 2 | 1 |
PLAN_START(numSegmentsForThisPlan:71) | -1 | -1 |
SELECT(selectList:uuid, stastext(point), ts, st_distance(point,‘80c053592b680000014043775173125738’)) | 3 | 2 |
TRANSFORM(st_distance(point,‘80c053592b680000014043775173125738’), stastext(point), ts, uuid) | 4 | 3 |
PROJECT(uuid, point, ts) | 5 | 4 |
DOC_ID_SET | 6 | 5 |
FILTER_H3_INDEX(indexLookUp:h3_index,operator:RANGE, predicate:st_distance( point,‘80c053592b680000014043775173125738’) < ‘50000’) | 7 | 6 |
FILTER_H3_INDEX
as one of the operators, otherwise the index isn’t being used.
We can do the same thing for the ST_Within
and ST_Contains
queries. We should see the INCLUSION_FILTER_H3_INDEX
operator for both these queries:
For ST_Within
, we’ll see the following plan:
Operator | Operator_Id | Parent_Id |
---|---|---|
BROKER_REDUCE(limit:1) | 1 | 0 |
COMBINE_AGGREGATE | 2 | 1 |
PLAN_START(numSegmentsForThisPlan:75) | -1 | -1 |
AGGREGATE(aggregations:count(*)) | 3 | 2 |
TRANSFORM_PASSTHROUGH() | 4 | 3 |
PROJECT() | 5 | 4 |
DOC_ID_SET | 6 | 5 |
INCLUSION_FILTER_H3_INDEX(inclusionIndex:h3_index,operator:EQ, predicate:stwithin(point,‘84000000010000000600000000c053ec04b00000004043bcd43b0aae 28c05335bdcbffffff40443dcf7fb88244c052e7a3bbffffff4043605c20b209c0c053675ddb ffffff4042c201e70a0102c05443204c00000040436edff9f6ec98c053e c04b00000004043bcd43b0aae28’) = ‘1’) | 7 | 6 |
INCLUSION_FILTER_H3_INDEX
operator contains the stwithin
predicate.
And for ST_Contains
, we’ll see this plan:
Operator | Operator_Id | Parent_Id |
---|---|---|
BROKER_REDUCE(limit:1) | 1 | 0 |
COMBINE_AGGREGATE | 2 | 1 |
PLAN_START(numSegmentsForThisPlan:75) | -1 | -1 |
AGGREGATE(aggregations:count(*)) | 3 | 2 |
TRANSFORM_PASSTHROUGH() | 4 | 3 |
PROJECT() | 5 | 4 |
DOC_ID_SET | 6 | 5 |
INCLUSION_FILTER_H3_INDEX(inclusionIndex:h3_index,operator:EQ, predicate:stcontains(‘84000000010000000600000000c053ec04b00000004043bcd43b0aae 28c05335bdcbffffff40443dcf7fb88244c052e7a3bbffffff4043605c20b209c0c053675ddb ffffff4042c201e70a0102c05443204c00000040436edff9f6ec98c053e c04b00000004043bcd43b0aae28’,point) = ‘1’) | 7 | 6 |
INCLUSION_FILTER_H3_INDEX
operator contains the stcontains
predicate.