DynamoDB provides Change Data Capture (CDC) capabilities through DynamoDB Streams, which capture data modifications in DynamoDB tables. The generated CDC data is written to streaming systems like Kafka and made available in real-time for downstream applications. Native support for the DynamoDB data format in Pinot allows users to consume CDC data in real-time from DynamoDB tables without complex transformations. As long as the data is available in any of Pinot’s supported streaming connectors, it can be ingested into a Pinot table.Documentation Index
Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
Use this file to discover all available pages before exploring further.
DynamoDB Message Decoder Configurations
To configure a Pinot table to use a DynamoDB formatted streaming source, Pinot provides a decoder -ai.startree.pinot.plugin.inputformat.dynamodb.DynamoDbMessageDecoder.
The properties of this decoder are listed below:
| Configuration Key | Description |
|---|---|
decoder.class.name | Specifies the primary decoder for DynamoDB messages. Set this to ai.startree.pinot.plugin.inputformat.dynamodb.DynamoDbMessageDecoder to enable DynamoDB CDC ingestion. |
dynamodb.timeColumnName | The column name where the ApproximateCreationDateTime from the DynamoDB JSON record should be stored. This timestamp can be used as the table’s default time column. |
dynamodb.deleteColumnName | The column name that will be set to true when a REMOVE record is received from DynamoDB, and false otherwise. This helps track deletion events in your Pinot table. |
dynamodb.envelope.decoder.class.name | Specifies the underlying decoder used to parse the message format. Since DynamoDB messages are in JSON format, this should typically be set to org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder. |
dynamodb.envelope.decoder.prop. | Prefix to be used for any properties associated with the envelope decoder class. |
Configuration Example
When ingesting a DynamoDB formatted payload from a stream, the decoder used for the stream must beai.startree.pinot.plugin.inputformat.dynamodb.DynamoDbMessageDecoder.
The following is an example stream config where the Pinot table is consuming from a JSON-encoded Kafka topic containing DynamoDB CDC payload:
org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory and the decoder associated with this stream is ai.startree.pinot.plugin.inputformat.dynamodb.DynamoDbMessageDecoder.
The configuration uses several key components:
- The primary decoder
ai.startree.pinot.plugin.inputformat.dynamodb.DynamoDbMessageDecoderhandles the DynamoDB-specific message format - The
dynamodb.timeColumnNameis populated with theApproximateCreationDateTimefrom the DynamoDB JSON record - The
dynamodb.deleteColumnNameis set totruewhenREMOVErecords are received from DynamoDB - The
dynamodb.envelope.decoder.class.nameis set toorg.apache.pinot.plugin.inputformat.json.JSONMessageDecodersince the underlying DynamoDB messages are in JSON format

