Default Value for Geospatial Columns
In this recipe we’ll learn how to set a default value for a Geospatial point column. To learn more about using Geospatial in Apache Pinot, see the Geospatial objects developer guide or Geospatial indexing developer guide.
Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/geospatial-default |
Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes.
Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
Launch Pinot Cluster
Spin up a Pinot Cluster by running the following command:
This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, Kafka, and Zookeeper. You can find the docker-compose.yml file on GitHub.
Generating Geospatial data
This recipe contains a data generator that produces JSON documents that contain Geospatial points but occassionally null values instead.
You’ll need to first install the following dependencies:
Once that’s done you can run the data generator and grab just the first generated document, by running the following command:
Output is shown below:
You can see from this output that we have a null value in the first event and a geospatial point in the secone one.
Kafka ingestion
We’re going to ingest this data into an Apache Kafka topic using the kcat command line tool.
We’ll also use jq
to structure the data in the key:payload
structure that Kafka expects:
We can check that Kafka has some data by running the following command:
We’ll see something like the following:
Pinot Schema and Table
Now let’s create a Pinot Schema and Table.
First, the schema:
Note that the column for point
has a data type of BYTES
. Geospatial columns must use the BYTES
type because Pinot will serialize the Geospatial objects into bytes for storage purposes.
We are also passing in a defaultNullValue
which must be a Hex encoded representation of a point. In this case the point is a location in the Arctic.
You can get back a Hex encoded representation of a Geospatial object by running a query that returns the object. For example:
003fe5f4a42008f90c4054e004d205fbe4 |
---|
003fe5f4a42008f90c4054e004d205fbe4 |
Query Results
Now for the table config:
When using a default value for a BYTES
column we’ll need to create the schema and table separately, rather than using the AddTable
command. If we try to use the AddTable
command, we’ll end up with double decoding of the defaultNullValue
, resulting in Pinot trying to store an invalid value.
Instead, we’ll create the schema with the AddSchema
command:
And then we’ll create the table via the REST API:
Querying for defaults
We can then the following query to check how many times the default value has been used:
distance | count(*) |
---|---|
0 | 3147529 |
250.4421880885498 | 1 |
250.35032602023736 | 1 |
250.3445873086931 | 1 |
250.32023987252305 | 1 |
250.28500644222484 | 1 |
250.26623501229875 | 1 |
250.26271972684384 | 1 |
250.25918510024783 | 1 |
250.2511511361528 | 1 |
Query Results