Prerequisites
You will need a running Pinot cluster locally to follow the code examples in this guide. Make sure you have the following configured in the machine that you are going to deploy Apache Pinot.- A valid account in AWS.
- A local installation of AWS CLI along with your credentials configured.
- An AWS access key for having programmatic access to S3.
Create an S3 bucket
Let’s create an S3 bucket calledpinot-demo
to keep the source CSV files.
You can use the AWS CLI to do that.
Copy CSV files
Create a CSV file calledtranscript.csv
with the following content.
Configure Pinot
Now that we have the CSV file in the S3 bucket. Let’s configure Pinot to ingest it and create a segment out of it. First, let’s create a schema and a table definition for the transcript data set. Create thetranscript_schema.json
as follows.
transcript_table.json
as follows.
Create the ingestion job spec file
Create a file calledjob-spec.yml
and add the following content to it.
className
pointing to the implementation class.
inputDirURI
specifies the S3 bucket location where Pinot should ingest the data from. If you remember, we copied the transcript.csv
file into that folder. The directive includeFileNamePattern
filters all CSV files in that folder.
Once the ingestion is completed, Pinot writes the segments into the location specified by outputDirURI
Initiate the ingestion job
Now we have everything in place. Let’s go ahead and kick off the ingestion by running:transcript
table populated with data in the Query Console.
