Pull Request Merged Events Stream
In this recipe, we will do the following:- Set up a Localstack Kinesis cluster (Optional)
- Install AWS CLI
- To set up a Pinot cluster, do the following: a. Start Zookeeper b. Start controller c. Start broker d. Start server
- Create a Kinesis stream with name pullRequestMergedEvents
- Create a real-time table and schema for pullRequestMergedEvents
- Start a task which reads from GitHub events API and publishes events about merged pull requests to the stream.
- Query the real-time data
Launch localstack Kinesis
This step is needed only when you don’t want to use official Kinesis in AWS. It creates a mock Kinesis cluster via Localstack.Install AWS CLI
We need to interact with Kinesis throughout this recipe. For this purpose, we’ll be using official AWS CLI to process the commands. The AWS CLI also works with localstack seamlessly. You can follow the official AWS documentation to get started with CLI. You need to ensure the credentials are properly configured for your AWS account. In case of localstack cluster, the default credentials are as follows:Setup Pinot Cluster and Kinesis Tables
There are multiple ways to set up the cluster. Here we’ll consider only Docker and Launcher scripts. For Kubernetes and other set ups, you can check out our official documentation.Pull docker image
Get the latest Docker image.With Quickstart Utility
You can use the following single-command utility to run all the previous steps. Make sure to stop any previous running Pinot services.Without Quickstart
Set up the Pinot cluster
Follow the instructions in Advanced Pinot Setup to set up the Pinot cluster with the components:- Zookeeper
- Controller
- Broker
- Server
Create a Kinesis stream
Create a Kinesis stream calledpullRequestMergedEvents
for the demo.Add Pinot table and schema
The schema is present atexamples/stream/githubEvents/pullRequestMergedEvents_schema.json
and is also pasted belowexamples/stream/githubEvents/docker/pullRequestMergedEvents_kinesis_realtime_table_config.json
and is also pasted below.If you’re using official Kinesis on your AWS account, you can remove the
endpoint
property from the table config.Publish events
Start streaming GitHub events into the Kinesis StreamPrerequisitesGenerate a personal access token on GitHub.Query
Head over to the Query Console to check the data.
Visualizing Data
You can use Superset or Tableau to visualize this data. To integrate with Superset you can check out the Superset Integrations page. You can also use our JDBC driver to connect Tableau to Pinot. Here are some insights captured via Tableau-Most Active organizations in last 1 hour

Total commits happening every minute

Resharding Kinesis Stream
Pinot’s Kinesis plugin has been designed to handle resharding in a Kinesis stream gracefully. Pinot ensures data in parent shards is consumed before children shards. Pinot creates segment per partition. Currently, partitions are mapped 1:1 with the shards. We take the last index of the shardID as the partition number. e.g.shardId-000000000000
is partition 0, shardId-000000000001
is partition 1, shardId-000000000002
is partition 2, and so on.
Each of the Pinot’s data segment contains partition id as part of their name.
e.g. segment name pullRequestMergedEvents__5__0__20220315T2036Z
is composed of
- tableName
- partitionId
- segmentNumber in the current partition
- current timestamp in
yyyyMMddTHHmm
format

New shards are detected by
RealtimeSegmentValidationManager
which is a periodic task that runs in Controller. You can also trigger this task manually to check new segments instead of waiting for configured interval.Split a Shard
Let’s first list out all the shardsshardId-000000000000
in the middle.
shardId-000000000003
and shardId-000000000004
are two new shards. Also, shardId-000000000000
is closed now since it contains an EndingSequenceNumber
.

pullRequestMergedEvents__3__0__20220315T2028Z
and pullRequestMergedEvents__4__0__20220315T2028Z
Merge two shards
Now let’s mergeshardId-000000000001
and shardId-000000000002
to create a new shard shardId-000000000005
.
shardId-000000000005

pullRequestMergedEvents__5__1__20220315T2036Z
are now visible.