Connect to Amazon S3
Step 1: In the Data Portal, click Tables and then click Create Table.
Step 2: Select S3 as the Data Source.
Step 3: Create a New Connection.
Click New Connection. If you want to use an existing connection, select the connection from the list and proceed to Step 5.
Enter a Source Name for the new connection.
Select the Authentication Type from the drop-down list.
Step 4: Configure Connection Parameters.
Connecting to S3 Using Basic Authentication
Use the following JSON configuration when S3 is set up with basic authentication using an access key and secret key.
Property Descriptions
Property | Required | Description |
---|---|---|
inputDirURI | Yes | URI of the input directory/files for ingestion. It tells Pinot where the data resides. |
input.fs.prop.region | Yes | Region of the file system (e.g., us-east-1 ). |
input.fs.prop.accessKey | Yes | Access key for authentication to the file system. |
input.fs.prop.secretKey | Yes | Secret key for authentication (paired with access key). |
Connecting to S3 Using IAM-Based Authentication
Use the following JSON configuration when S3 is set up with IAM-based authentication by assuming an IAM Role for secure access.
Property Descriptions
Property | Required | Description |
---|---|---|
inputDirURI | Yes | URI of the input directory/files for ingestion. It tells Pinot where the data resides. |
input.fs.prop.region | Yes | Region of the file system (e.g., us-east-1 ). |
input.fs.prop.externalId | Yes | External ID of the AWS Account of your StarTree Cloud. |
input.fs.prop.roleArn | Yes | The Amazon Resource Name (ARN) of an AWS IAM role to assume for accessing AWS S3. Allows Pinot to securely access resources in different AWS accounts. Example Use Case: If Pinot is running in Account A but the S3 bucket is in Account B, you can assume a role in Account B that grants access to the bucket. The role must have permissions like “s3:List*” and “s3:GetObject” for proper access. |
input.fs.className | Yes | The file system implementation class to access the input directory. Examples: - org.apache.pinot.plugin.filesystem.S3PinotFS (for Amazon S3) - org.apache.pinot.plugin.filesystem.LocalPinotFS (for local file systems) - org.apache.pinot.plugin.filesystem.HadoopPinotFS (for HDFS) |
Step 5: Test the Connection and Configure Data Ingestion.
After you have configured the connection properties, test the connection to ensure it is working.
When the connection is successful, use the following JSON to configure additional data settings:
Property Descriptions
Property | Required | Description |
---|---|---|
inputFormat | Yes | The format of the input files. Supported values include csv, json, avro, parquet, etc. |
includeFileNamePattern | Yes | The glob pattern to filter which files to include for ingestion. Used when the input directory contains mixed files and only specific files should be ingested. |
excludeFileNamePattern | No | The glob pattern to filter which files to exclude from ingestion. Used when the input directory contains mixed files and only specific files should not be ingested. |
Configure Record Reader
Configure the record reader to customize how the file format is read during ingestion.
Step 6: Preview the Data
Click Show Sample Data to preview the source data before finalizing the configuration.
Next Step
Proceed with Data Modeling.