Property Name | Required | Description |
---|---|---|
delta.ingestion.table.fs | Yes | Must be set to “S3” or “GCS” |
delta.ingestion.table.uri | Yes | It is the source location of the Delta Lake table. For Amazon S3, the URI should start with: s3a://<bucket-name>/path/to/delta-table . For GCS, the URI should start with: gs://<bucket-name>/path/to/delta-table . |
delta.ingestion.useDeltaKernel | No | When set to "true" this will make the Delta Lake connector use the Delta Kernel API. When not provided this will default to false . It should be set to true , when delta table is hosted on GCS |
tableMaxNumTasks | No | This is used to determine how to partition ingestion jobs across Minion workers |
schedule | Yes | This sets the schedule for execution and uses the Quartz schedule format for setting the schedule |
segmentIngestionType | Yes | It should be set to REFRESH. As the complete segment is replaced every time during ingestion for a particular file. This configuration is part of ingestionConfig.batchIngestionConfig |
Property Name | Required | Description |
---|---|---|
delta.ingestion.table.s3.region | Yes | The AWS region that houses the Delta Lake. |
delta.ingestion.table.s3.accessKey | No | If you are using Access Key based authorization then this is for the AWS access key. You will also need the AWS Secret Key below. |
delta.ingestion.table.s3.secretKey | No | The AWS Secret Key. |
Property Name | Required | Description |
---|---|---|
delta.ingestion.table.s3.region | Yes | The AWS region that houses the Delta Lake. |
delta.ingestion.table.s3.role.arn | Yes | "arn:aws:iam::REPLACE_WITH_AWS_ACCOUNT_ID_WHERE_DATA_RESIDES:role/DataManagerDeltaLakeS3IAMRole" |
fs.s3a.aws.credentials.provider | Yes | Must be "org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider" on StarTree Pinot versions 1.2.0* and "ai.startree.connectors.auth.AssumedRoleCredentialsProvider" on StarTree Pinot versions 1.3.0*. |
fs.s3a.assumed.role.sts.endpoint | Yes | The STS endpoint for your account (e.g., https://sts.us-west-2.amazonaws.com ) |
fs.s3a.assumed.role.sts.endpoint.region | Yes | The Region for your account (e.g, us-west-2 ) |
fs.s3a.assumed.role.arn | Yes | The ARN For the Role that will be assumed to get access ("arn:aws:iam::REPLACE_WITH_AWS_ACCOUNT_ID_WHERE_DATA_RESIDES:role/DataManagerDeltaLakeS3IAMRole" ) |
fs.s3a.assumed.role.credentials.provider | Yes | Must be com.amazonaws.auth.InstanceProfileCredentialsProvider . Used to gain a set of temporary security credentials to access an ARN that is in an account outside of the StarTree cluster. |
Property Name | Required | Description |
---|---|---|
delta.ingestion.table.gcs.projectId | Yes | It is the GCP project ID associated with the bucket that stores Delta Lake data. |
delta.ingestion.table.gcs.keyContentBase64 | No | It is used to specify the Google Cloud Service Account key in Base64-encoded format for authentication when accessing Delta Lake data stored in GCS. delta.ingestion.table.gcs.keyFile can be alternatively used instead of this parameter. See Google documentation on how to create a service account key. |
delta.ingestion.table.gcs.keyFile | No | It specifies the path to the Google Cloud Service Account key file used for authentication when accessing Delta Lake data stored in a GCS bucket. It should be available locally in all minions. delta.ingestion.table.gcs.keyContentBase64 can be alternatively used instead of this parameter. |
What happens if the same row is updated multiple times in my Delta table (e.g., order status updates)?
Can I configure Pinot to refresh every minute like my Delta table?
Does StarTree support Unity Catalog (UC) for column-level access control?
Does Pinot always poll the S3 bucket to list files for ingestion?
Does StarTree support Delta Column Mapping?