Step 1: In the Data Portal, click Tables and then click Create Table.

Step 2: Select GCS as the Data Source.

Step 3: Create a New Connection.

Click New Connection. If you want to use an existing connection, select the connection from the list and proceed to Step 5.

Enter a Source Name for the new connection.

Step 4: Configure Connection Parameters.

Use the following JSON script to set up your service account:

{
    "inputDirURI": "gs://my-bucket-name/directory",
    "input.fs.prop.projectId": "",
    "input.fs.prop.jsonKey": "",
    "input.fs.className": "org.apache.pinot.plugin.filesystem.GcsPinotFS"
}

Property Descriptions

The following table outlines the required properties for configuring a Google Cloud Storage (GCS) connection.

PropertyRequiredDescription
input.fs.prop.projectIdYesThe GCP project ID. Find your project ID in the Google Cloud console.
input.fs.prop.jsonKeyYesThe string-encoded Google service account key. Include the appropriate format for your selected JSON key. See Google documentation on how to create a service account key.
inputDirURIYesThe path to input file(s).
input.fs.classNameYesThe class name (org.apache.pinot.plugin.filesystem.GcsPinotFS) for the Pinot file system.

Step 5: Test the Connection and Configure Data Settings

After you have configured the connection properties, test the connection to ensure it is working.

When the connection is successful, use the following JSON to configure additional data settings:

{
    "inputFormat": "",
    "includeFileNamePattern": "gs://*.csv"
}

Property Descriptions

The following table outlines the required and optional properties for configuring data ingestion in Google Cloud Storage (GCS).

PropertyRequiredDescription
inputFormatYesThe input file format. Supported values include csv, json, avro, parquet, etc.
includeFileNamePatternOptionalThe glob pattern to identify which files to include for ingestion. This parameter fetches data only from the files in the inputDirURI path that match this pattern.
excludeFilePatternMatchOptionalThe glob pattern to identify which files to exclude from ingestion. This parameter restricts fetching data from the files in the inputDirURI path that match this pattern.

Configure Record Reader

Configure the record reader to customize how the file format is read during ingestion.

Step 6: Sample Data

Click Show Sample Data to see a preview of the source data.