Pinot Version | 0.9.3 |
---|---|
Code | startreedata/pinot-recipes/google-cloud-storage |
Prerequisites
To follow the code examples in this guide, do the following:- Install Docker and the Google Cloud CLI locally.
- Create a GCP project and a user or service account that has permission to list and create buckets, and then navigate to https://console.cloud.google.com/storage/browser and create a bucket, for example
pinot-deepstore.yourdomain.com
- Download recipes
Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:Controller configuration
We need to provide configuration parameters to the Pinot Controller to configure MinIO as the Deep Store. This is done in the following section of the Docker Compose file:/config/controller-conf.conf
, the contents of which are shown below:
controller.data.dir
contains the name of our bucket.pinot.controller.storage.factory.gs.projectId
contains the name of our GCP project.pinot.controller.storage.factory.gs.gcpKey
contains the path to our GCP JSON key file.
- Replace
<bucket-name>
with the name of your bucket. - Replace
<project-id>
with the name of your GCP project.
config/service-account.json
.
Pinot Schema and Tables
Now let’s create a Pinot Schema and real-time table.Schema
Our schema is going to capture some simple events, and looks like this:Real-Time Table
And the real-time table is defined below:The
realtime.segment.flush.threshold.rows
config is intentionally set to an extremely small value so that the segment will be committed after 10,000 records have been ingested. In a production system this value should be set much higher, as described in the configuring segment threshold guide.Ingesting Data
Let’s ingest data into theevents
Kafka topic, by running the following:
Exploring Deep Store
Now we’re going to check what segments we have and where they’re stored. You can get a list of all segments by running the following:events__0__3__20220505T1343Z
and get its metadata, by running the following:
gs://pinot-events/events/events__0__3__20220505T1343Z
.
Let’s go back to the terminal and return a list of all the segments in the bucket: