Ingesting Avro Messages
In this recipe we’ll learn how to ingest Avro messages from Apache Kafka into Apache Pinot. The Avro schema will be stored in the Confluent Schema Registry, so we’ll learn how to integrate with that as well.
Watch the following video about ingesting Avro encoded messages into Apache Pinot, or follow the tutorial below.
Pinot Version | 1.0.0 |
---|---|
Code | startreedata/pinot-recipes/ingest-avro |
Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes.
Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:
This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, Kafka, Zookeeper, and the Confluent Schema Registry. You can find the docker-compose.yml file on GitHub.
Data Generator
This recipe contains a data generator that creates events with data about people.
It uses the Faker library, so you’ll first need to install that:
You can generate data by running the following command:
Output is shown below:
Avro Schema
The Avro schema for our messages is described below:
avro/person-topic-value.avsc
Kafka ingestion
We’re going to ingest the stream of people into Kafka using the Kafka Python client. We’ll need to install the following libraries:
We’re going to stream the messages produced by datagen.py
into the following script:
kafkaproducer.py
This script first creates a Kafka producer that knows about the Avro schema and schema registry.
It then infinitely ingests an infinite stream from stdin
and writes Avro messages to Kafka.
We can combine the data generator script with this one by running the following code:
Once we’ve done that, let’s check the messages are being ingested into the person-topic
using kcat:
The output is shown below:
Next, we’re going to ingest the stream into Pinot.
Pinot schema and table
Our Pinot schema is shown below:
schema.json
And the table config is below:
table.json
The highlighted section defines a decoder for processing Avro messages via the schema registry. The URL for the schema registry is defined along with the Avro schema name.
We can create the table and schema by running the following command:
Querying Pinot
We can then run the following query via the Pinot UI:
The output will look something like this:
person.id | person.interests | person.age | person.address.country |
---|---|---|---|
311b5e5e-060a-45f3-b050-e7b9e268dceb | [“Photography”,“Cooking”,“Fashion”,“Running”] | 45 | Zambia |
d29e52eb-41d8-4902-92fb-8ed6f51115b1 | [“Cycling”,“Dancing”,“Music”] | 60 | Qatar |
b1a0cf13-8daf-4120-865f-7ac3ae51c3c4 | [“Yoga”,“Painting”,“Swimming”] | 48 | Sierra Leone |
f774187e-ee0a-4a2f-abbb-e73ba86a1dfb | [“Music”,“Gardening”] | 67 | Belize |
c47cf80b-8104-480f-a9de-046f10f4c3f0 | [“Art”,“Meditation”] | 31 | Mayotte |
c62586ff-24d9-43cb-9bb5-50b7e9a4a297 | [“Painting”,“Yoga”] | 57 | Azerbaijan |
175a0032-af11-4e58-8ab5-9bfbed70eabf | [“Baking”,“Fashion”,“Cooking”,“Baking”,“Gardening”] | 73 | Wallis and Futuna |
82940700-00e1-4d3a-8333-d30165fec35a | [“Fashion”,“Fashion”,“Fishing”] | 35 | Saint Martin |
f27d06e9-686e-42f3-a6d8-742c046ea3d7 | [“Sports”,“Reading”,“Reading”,“Cycling”,“Traveling”] | 27 | Russian Federation |
0f1189ce-77ae-46cb-9f78-bf841b9f7e86 | [“Sports”,“Reading”,“Baking”,“Reading”] | 74 | Sri Lanka |
Query Results