Prerequisites
You will need a running Pinot cluster locally to follow the code examples in this guide. Make sure you have the following configured in the machine that you are going to deploy Apache Pinot.- A valid account in AWS.
- A local installation of AWS CLI along with your credentials configured.
- An AWS access key for having programmatic access to S3.
Create an S3 bucket
Let’s create an S3 bucket calledpinot-demo to keep the source CSV files.
You can use the AWS CLI to do that.
Copy CSV files
Create a CSV file calledtranscript.csv with the following content.
Configure Pinot
Now that we have the CSV file in the S3 bucket. Let’s configure Pinot to ingest it and create a segment out of it. First, let’s create a schema and a table definition for the transcript data set. Create thetranscript_schema.json as follows.
transcript_table.json as follows.
Create the ingestion job spec file
Create a file calledjob-spec.yml and add the following content to it.
className pointing to the implementation class.
inputDirURI specifies the S3 bucket location where Pinot should ingest the data from. If you remember, we copied the transcript.csv file into that folder. The directive includeFileNamePattern filters all CSV files in that folder.
Once the ingestion is completed, Pinot writes the segments into the location specified by outputDirURI
Initiate the ingestion job
Now we have everything in place. Let’s go ahead and kick off the ingestion by running:transcript table populated with data in the Query Console.


