Create a connection to ingest files stored in Amazon S3 buckets.
Property | Required | Description |
---|---|---|
inputDirURI | Yes | URI of the input directory/files for ingestion. It tells Pinot where the data resides. |
input.fs.prop.region | Yes | Region of the file system (e.g., us-east-1 ). |
input.fs.prop.accessKey | Yes | Access key for authentication to the file system. |
input.fs.prop.secretKey | Yes | Secret key for authentication (paired with access key). |
Property | Required | Description |
---|---|---|
inputDirURI | Yes | URI of the input directory/files for ingestion. It tells Pinot where the data resides. |
input.fs.prop.region | Yes | Region of the file system (e.g., us-east-1 ). |
input.fs.prop.externalId | Yes | External ID of the AWS Account of your StarTree Cloud. |
input.fs.prop.roleArn | Yes | The Amazon Resource Name (ARN) of an AWS IAM role to assume for accessing AWS S3. Allows Pinot to securely access resources in different AWS accounts. Example Use Case: If Pinot is running in Account A but the S3 bucket is in Account B, you can assume a role in Account B that grants access to the bucket. The role must have permissions like “s3:List*” and “s3:GetObject” for proper access. |
input.fs.className | Yes | The file system implementation class to access the input directory. Examples: - org.apache.pinot.plugin.filesystem.S3PinotFS (for Amazon S3) - org.apache.pinot.plugin.filesystem.LocalPinotFS (for local file systems) - org.apache.pinot.plugin.filesystem.HadoopPinotFS (for HDFS) |
Property | Required | Description |
---|---|---|
inputFormat | Yes | The format of the input files. Supported values include csv, json, avro, parquet, etc. |
includeFileNamePattern | Yes | The glob pattern to filter which files to include for ingestion. Used when the input directory contains mixed files and only specific files should be ingested. |
excludeFileNamePattern | No | The glob pattern to filter which files to exclude from ingestion. Used when the input directory contains mixed files and only specific files should not be ingested. |
CSV
CSVRecordReaderConfig
is used for handling CSV files with the following customizable options:AVRO
AvroRecordReaderConfig
is supported.Parquet
org.apache.pinot.plugin.inputformat.parquet.ParquetAvroRecordReader
Use Parquet Native Record Reader:org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader