Step 1: In the Data Portal, click Tables and then click Create Table.

​Step 2: Select ADLS as the Data Source.

​Step 3: Create a New Connection.

Click New Connection. If you want to use an existing connection, select the connection from the list and proceed to Step 5.

Enter a Source Name for the new connection.

Select the Authentication Type from the drop-down list.

Step 4: Configure Connection Parameters

Connecting to ADLS Using the Access Key

Use the following JSON configuration when ADLS is set up with basic authentication using an access key.

{
  "input.fs.className": "org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS",
  "input.fs.prop.accountName": "<account-name>",
  "authenticationType": "ACCESS_KEY",
  "input.fs.prop.accessKey": "AKIAEXAMPLEACCESSKEY",
  "input.fs.prop.fileSystemName": "<file-system-name>",
  "inputDirURI": "abfs://<file-system-name>@<account-name>.dfs.core.windows.net/<data-directory-name>/"
}

Property Descriptions

PropertyRequiredDescription
input.fs.classNameYesThe class name of the ADLS Gen2 Pinot file system.
input.fs.prop.accountNameYesThe Azure Storage account name used for ADLS Gen2 integration.
authenticationTypeOptionalSet as ACCESS_KEY to use an access key for authentication.
input.fs.prop.accessKeyYesThe access key used for authentication when authenticationType is ACCESS_KEY.
input.fs.prop.fileSystemNameYesThe name of the file system (container) in the Azure Storage account.
inputDirURIYesThe URI of the input directory in ADLS Gen2 where Pinot reads data.

Connecting to ADLS Using Azure Active Directory

Use the following JSON configuration when ADLS is set up with Azure Active Directory authentication.

{
  "input.fs.className": "org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS",
  "input.fs.prop.accountName": "Account-Name",
  "authenticationType": "AZURE_AD",
  "input.fs.prop.clientId": "Client-ID",
  "input.fs.prop.clientSecret": "SECRETKEYEXAMPLE12345",
  "input.fs.prop.tenantId": "tenant-id",
  "input.fs.prop.fileSystemName": "file-system-name",
  "inputDirURI": "abfs://my-container-name@myazure-storage-account.dfs.core.windows.net/data-directory/"
}

Property Descriptions

PropertyRequiredDescription
input.fs.classNameYesThe class name of the ADLS Gen2 Pinot file system. Must be set to org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS for ADLS Gen2 integration.
input.fs.prop.accountNameYesThe name of the Azure Storage account to be used for ADLS Gen2 integration.
authenticationTypeOptionalThe type of authentication used to access the ADLS storage. Set as AZURE_AD to use Azure Active Directory (AAD) Service Principal for authentication.
input.fs.prop.clientIdYesThe Client ID of the Azure Service Principal used for authentication.
input.fs.prop.clientSecretYesThe Client Secret key for the Azure Service Principal.
input.fs.prop.tenantIdYesThe Tenant ID associated with the Azure Active Directory (AAD).
input.fs.prop.fileSystemNameYesThe name of the file system (container) in the Azure Storage account.
inputDirURIYesThe URI for the input directory in ADLS Gen2 where Pinot reads data. It is generally in the format: abfs://my-container-name@myazure-storage-account.dfs.core.windows.net/data-directory/.

Connecting to ADLS Using Azure Active Directory with Proxy

Use the following JSON configuration when ADLS is set up with Azure Active Directory authentication via a proxy server.

{
  "input.fs.className": "org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS",
  "input.fs.prop.accountName": "Account-Name",
  "authenticationType": "AZURE_AD_WITH_PROXY",
  "input.fs.prop.clientId": "Client-ID",
  "input.fs.prop.clientSecret": "SECRETKEYEXAMPLE12345",
  "input.fs.prop.tenantId": "tenant-id",
  "input.fs.prop.proxyHost": "proxy-host",
  "input.fs.prop.proxyPort": "proxy-port",
  "input.fs.prop.proxyUsername": "proxy-username",
  "input.fs.prop.proxyPassword": "proxy-password",
  "input.fs.prop.fileSystemName": "file-system-name",
  "inputDirURI": "abfs://my-container-name@myazure-storage-account.dfs.core.windows.net/data-directory/"
}

Property Descriptions

PropertyRequiredDescription
input.fs.classNameYesThe class name of the ADLS Gen2 Pinot file system. Must be set to org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS for ADLS Gen2 integration.
input.fs.prop.accountNameYesThe name of the Azure Storage account to be used for ADLS Gen2 integration.
authenticationTypeOptionalThe type of authentication used to access the ADLS storage. Set as AZURE_AD_WITH_PROXY to use Azure Active Directory (AAD) Service Principal for authentication via a proxy server.
input.fs.prop.clientIdYesThe Client ID of the Azure Service Principal used for authentication.
input.fs.prop.clientSecretYesThe Client Secret key for the Azure Service Principal.
input.fs.prop.tenantIdYesThe Tenant ID associated with the Azure Active Directory (AAD).
input.fs.prop.proxyHostYesThe hostname of the proxy server used to connect to ADLS.
input.fs.prop.proxyPortYesThe port number of the proxy server.
input.fs.prop.proxyUsernameYesThe username for proxy authentication.
input.fs.prop.proxyPasswordYesThe password for proxy authentication.
input.fs.prop.fileSystemNameYesThe name of the file system (container) in the Azure Storage account.
inputDirURIYesThe URI for the input directory in ADLS Gen2 where Pinot reads data. It is generally in the format: abfs://my-container-name@myazure-storage-account.dfs.core.windows.net/data-directory/.

Step 5: Test the Connection and Configure Data Ingestion

After you have configured the connection properties, test the connection to ensure it is working.

When the connection is successful, use the following JSON to configure additional data settings:

{
  "inputFormat": "",
  "includeFileNamePattern": ""
}

Property Descriptions

PropertyRequiredDescription
inputFormatYesThe format of the input files. Supported values include csv, json, avro, parquet, etc.
includeFileNamePatternYesThe glob pattern to identify which files to include for ingestion. This is useful when the input directory contains a mix of files, and only specific files should be ingested.
excludeFileNamePatternOptionalThe glob pattern to identify which files to exclude from ingestion. This is useful when the input directory contains a mix of files, and only specific files should not be ingested.

Configure Record Reader

Configure the record reader to customize how the file format is read during ingestion.

Step 6: Preview the Data

Click Show Sample Data to preview the source data before finalizing the configuration.