Create a connection to ingest files from Google Cloud Storage.
Property | Required | Description |
---|---|---|
input.fs.prop.projectId | Yes | The GCP project ID. Find your project ID in the Google Cloud console. |
input.fs.prop.jsonKey | Yes | The string-encoded Google service account key. Include the appropriate format for your selected JSON key. See Google documentation on how to create a service account key. |
inputDirURI | Yes | The path to input file(s). |
input.fs.className | Yes | The class name (org.apache.pinot.plugin.filesystem.GcsPinotFS ) for the Pinot file system. |
Property | Required | Description |
---|---|---|
inputFormat | Yes | The input file format. Supported values include csv, json, avro, parquet, etc. |
includeFileNamePattern | Optional | The glob pattern to identify which files to include for ingestion. This parameter fetches data only from the files in the inputDirURI path that match this pattern. |
excludeFilePatternMatch | Optional | The glob pattern to identify which files to exclude from ingestion. This parameter restricts fetching data from the files in the inputDirURI path that match this pattern. |
CSV
CSVRecordReaderConfig
is used for handling CSV files with the following customizable options:AVRO
AvroRecordReaderConfig
is supported.Parquet
org.apache.pinot.plugin.inputformat.parquet.ParquetAvroRecordReader
Use Parquet Native Record Reader:org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader