Overview

This guide demonstrates how to optimize cross-Availability Zone (AZ) traffic using AZ-aware Kafka consumers in your StarTree Pinot cluster.

This guide covers the following key areas:

  1. Understanding the need for cross-AZ traffic optimization
  2. High level approach
  3. Implementation details

The Problem

In a StarTree Pinot cluster, Pinot servers utilize low-level Kafka consumers to retrieve data from Kafka brokers. When a Pinot consumer operates in a different Availability Zone than the broker hosting the required partition, each fetch request generates cross-AZ network traffic.

Cross-AZ traffic for Kafka consumers creates several challenges:

  • Increased costs: Cross-AZ data transfer incurs additional charges
  • Higher latency: Network requests across zones introduce additional delay
  • Reduced reliability: Cross-zone communication increases potential failure points

Implementing AZ-aware consumption in StarTree pinot provides:

  • Improved application performance through reduced latency
  • Significant cost savings on data transfer fees
  • Enhanced system reliability and fault tolerance
Az Aware 2 Pn

Solution Architecture

The optimization strategy centers on implementing AZ-aware Kafka consumers using the Kafka RackAwareReplicaSelector. This approach ensures that Pinot servers preferentially consume from Kafka brokers within the same Availability Zone.

Az Aware 1 Pn

Here are the key steps in achieving this

  • Step 1: Implement AZ-Aware Instance Assignment

    Configure the instance assignment strategy to consider Availability Zone placement when distributing workloads across the cluster.

  • Step 3: Configure AZ-Aware Table Settings

Implementation details

Make instance assignment AZ-aware

First thing to do is setup pool-based instance assignment, wherein we tag servers in the same AZ with the same name (eg CLOUD_AZ_POOL_REALTIME). For example, set servers in aps1-az1 with value 0, aps1-az2 with value 1, etc.

{
  "listFields": {
    "TAG_LIST": {
      "CLOUD_AZ_POOL_REALTIME"
    }
  },
  "mapFields": {
    "pool": {
      "CLOUD_AZ_POOL_REALTIME": 0
  },
}

Make table configuration AZ-aware

When we create realtime table, configure client.rack

"client.rack": "${CLOUD_AZ}"

This environment variable CLOUD_AZ is automatically set on the servers and includes the coprresponding cloud zone information.

For pool-based instance assignment, you need to configure CONSUMING with tag CLOUD_AZ_POOL_REALTIME and poolBased in instanceAssignmentConfigMap:


Example config:

"instanceAssignmentConfigMap": {
  "CONSUMING": {
    "tagPoolConfig": {
      "tag": "CLOUD_AZ_POOL_REALTIME",
      "poolBased": true
    },
    "replicaGroupPartitionConfig": {
      "replicaGroupBased": true,
      "numInstances": 0,
      "numReplicaGroups": 2,
      "numInstancesPerReplicaGroup": 0,
      "numPartitions": 0,
      "numInstancesPerPartition": 1
    },
    "partitionSelector": "INSTANCE_REPLICA_GROUP_PARTITION_SELECTOR"
  }
}

Summary

This guide addresses the critical issue of cross-Availability Zone network traffic in StarTree Pinot clusters. This can be enabled by configuring pool based instance assignment and setting client.rack property of kafka consumer to the right value. Results demonstrate substantial optimization with same-AZ traffic increasing from 50% to 96-98% across all tested zones, resulting in significant cost savings and improved system performance.