In this recipe, we will learn how Apache Pinot can be configured to ingest CSV files from an AWS S3 bucket.
pinot-demo
to keep the source CSV files.
You can use the AWS CLI to do that.
transcript.csv
with the following content.
transcript_schema.json
as follows.
transcript_table.json
as follows.
job-spec.yml
and add the following content to it.
className
pointing to the implementation class.
inputDirURI
specifies the S3 bucket location where Pinot should ingest the data from. If you remember, we copied the transcript.csv
file into that folder. The directive includeFileNamePattern
filters all CSV files in that folder.
Once the ingestion is completed, Pinot writes the segments into the location specified by outputDirURI
transcript
table populated with data in the Query Console.