Groovy Transformation Functions
In this recipe we’ll learn how to use Groovy transformation functions to ingest a CSV file with column names contain spaces.
Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/groovy-transformation-functions |
Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes.
Navigate to recipe
- If you haven’t already, download recipes.
- Navigate to the recipe by running the following command:
Launch Pinot Cluster
You can spin up a Pinot Cluster and Kafka Broker by running the following command:
This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, Kafka Broker, and Zookeeper. You can find the docker-compose.yml file on GitHub.
Controller configuration
We need to provide configuration parameters to the Pinot Controller to enable Groovy in transformation functions. This is done in the following section of the Docker Compose file:
The configuration is specified in /config/controller-conf.conf
, the contents of which are shown below:
/config/controller-conf.conf
Dataset
We’re going to import a couple of JSON documents into Kafka and then from there into Pinot.
Pinot Schema and Table
Now let’s create a Pinot Schema and Table.
Only the timestamp field from our data source maps to a schema column name - we’ll be using transformation functions to populate the id and name columns.
config/schema.json
The table config indicates that data will be ingested from the Kafka events
topic:
config/table.json
Let’s dive into the transformation functions defined under ingestionConfig.transformConfigs
:
- The id one extracts
payload.after.id
if theafter
property exists, otherwise it usespayload.before.id
- The name one concatenates
payload.firstName
andpayload.lastName
They both use Groovy’s JSON parser to create an object from the payload, before using logic from the programming language to return the desired out.
If you only need to do simple data transformation, you can use the built-in transformation functions.
We can add the table and schema by running the following command:
Ingest Data into Kafka
We can run the following command to import a couple of documents into Kafka:
Let’s check those documents have been imported by running the following command:
Output
Looks good so far.
Querying
Once that’s completed, navigate to localhost:9000/#/query and click on the events
table or copy/paste the following query:
You will see the following output:
id | name | timestamp |
---|---|---|
3 | James Smith | 2019-10-09 21:25:25.0 |
2 | John Gates | 2019-10-10 21:33:25.0 |
Query Results