Create a connection to ingest files from Azure Data Lake Storage.
Property | Required | Description |
---|---|---|
input.fs.className | Yes | The class name of the ADLS Gen2 Pinot file system. |
input.fs.prop.accountName | Yes | The Azure Storage account name used for ADLS Gen2 integration. |
authenticationType | Optional | Set as ACCESS_KEY to use an access key for authentication. |
input.fs.prop.accessKey | Yes | The access key used for authentication when authenticationType is ACCESS_KEY . |
input.fs.prop.fileSystemName | Yes | The name of the file system (container) in the Azure Storage account. |
inputDirURI | Yes | The URI of the input directory in ADLS Gen2 where Pinot reads data. |
Property | Required | Description |
---|---|---|
input.fs.className | Yes | The class name of the ADLS Gen2 Pinot file system. Must be set to org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS for ADLS Gen2 integration. |
input.fs.prop.accountName | Yes | The name of the Azure Storage account to be used for ADLS Gen2 integration. |
authenticationType | Optional | The type of authentication used to access the ADLS storage. Set as AZURE_AD to use Azure Active Directory (AAD) Service Principal for authentication. |
input.fs.prop.clientId | Yes | The Client ID of the Azure Service Principal used for authentication. |
input.fs.prop.clientSecret | Yes | The Client Secret key for the Azure Service Principal. |
input.fs.prop.tenantId | Yes | The Tenant ID associated with the Azure Active Directory (AAD). |
input.fs.prop.fileSystemName | Yes | The name of the file system (container) in the Azure Storage account. |
inputDirURI | Yes | The URI for the input directory in ADLS Gen2 where Pinot reads data. It is generally in the format: abfs://my-container-name@myazure-storage-account.dfs.core.windows.net/data-directory/ . |
Property | Required | Description |
---|---|---|
input.fs.className | Yes | The class name of the ADLS Gen2 Pinot file system. Must be set to org.apache.pinot.plugin.filesystem.ADLSGen2PinotFS for ADLS Gen2 integration. |
input.fs.prop.accountName | Yes | The name of the Azure Storage account to be used for ADLS Gen2 integration. |
authenticationType | Optional | The type of authentication used to access the ADLS storage. Set as AZURE_AD_WITH_PROXY to use Azure Active Directory (AAD) Service Principal for authentication via a proxy server. |
input.fs.prop.clientId | Yes | The Client ID of the Azure Service Principal used for authentication. |
input.fs.prop.clientSecret | Yes | The Client Secret key for the Azure Service Principal. |
input.fs.prop.tenantId | Yes | The Tenant ID associated with the Azure Active Directory (AAD). |
input.fs.prop.proxyHost | Yes | The hostname of the proxy server used to connect to ADLS. |
input.fs.prop.proxyPort | Yes | The port number of the proxy server. |
input.fs.prop.proxyUsername | Yes | The username for proxy authentication. |
input.fs.prop.proxyPassword | Yes | The password for proxy authentication. |
input.fs.prop.fileSystemName | Yes | The name of the file system (container) in the Azure Storage account. |
inputDirURI | Yes | The URI for the input directory in ADLS Gen2 where Pinot reads data. It is generally in the format: abfs://my-container-name@myazure-storage-account.dfs.core.windows.net/data-directory/ . |
Property | Required | Description |
---|---|---|
inputFormat | Yes | The format of the input files. Supported values include csv, json, avro, parquet, etc. |
includeFileNamePattern | Yes | The glob pattern to identify which files to include for ingestion. This is useful when the input directory contains a mix of files, and only specific files should be ingested. |
excludeFileNamePattern | Optional | The glob pattern to identify which files to exclude from ingestion. This is useful when the input directory contains a mix of files, and only specific files should not be ingested. |
CSV
CSVRecordReaderConfig
is used for handling CSV files with the following customizable options:AVRO
AvroRecordReaderConfig
is supported.Parquet
org.apache.pinot.plugin.inputformat.parquet.ParquetAvroRecordReader
Use Parquet Native Record Reader:org.apache.pinot.plugin.inputformat.parquet.ParquetNativeRecordReader