Pinot Version | 0.9.3 |
---|---|
Code | startreedata/pinot-recipes/ingest-parquet-files-from-s3-using-spark |
quick-start-batch
launcher script.
pinot-spark-demo
:
events_schema.json
with a text editor and add the following content.
That represents the schema definition to capture events.
events_table.json
with the following content. That represents the events table.
<PINOT_HOME>/bin
directory to define events schema and the table.
jobType
as SegmentCreationAndMetadataPush
, which is more performant and lightweight for the controller. When using this jobType, make sure that controllers and servers have access to the outputDirURI so that they can download segments from directly.
Other supported jobTypes are:
PINOT_DISTRIBUTION_DIR
and SPARK_HOME
to match with your local environment.
The above command includes the JARs of all the required plugins in Spark’s driver classpath. In practice, you only need to do this if you get a ClassNotFoundException.
The command will take a few seconds to complete based on your machine’s performance.
If the ingestion job doesn’t throw out any errors to the console, that means you have successfully populated the events table with data coming from the S3 bucket.
The data explorer shows the populated events table as follows.