JSON unnesting in Apache Pinot—powered by StarTree—makes it easy to work with deeply nested and semi-structured data. This feature allows users to extract nested fields from JSON objects and arrays into flat, queryable formats during ingestion.

Whether you’re dealing with clickstream data, telemetry logs, or API responses, unnesting simplifies data modeling and enhances query performance.

Why JSON Unnesting?

  • Simplifies Queries: Flatten nested data for easier filtering, aggregations, and joins.
  • Improves Performance: Avoids expensive JSON path evaluations at query time.
  • Streamlines Data Modeling: Convert complex structures into relational form at ingestion.

Enabling JSON Unnesting in StarTree

JSON unnesting can be configured via:

  • The StarTree Data Portal UI, or
  • The ingestion config JSON using transformConfigs and unnestConfig.

Examples

Let’s go through a few examples to unnest a JSON Object,

Example 1: Unnesting a JSON Object

Input JSON:

{
  "user": {
    "id": 123,
    "name": "Alice"
  },
  "event": "click"
}

Unnesting Config:

"ingestionConfig": {
    "transformConfigs": [
      {
        "columnName": "user_id",
        "transformFunction": "JSONPATHLONG(user, '$.id')"
      },
      {
        "columnName": "user_name",
        "transformFunction": "JSONPATHSTRING(user, '$.name')"
      }
    ]
}

Resulting Schema:

user_iduser_nameevent
123Aliceclick

Example 2: Unnesting an Array into Multiple Rows

Input JSON:

{
  "orderId": "A001",
  "items": [
    { "sku": "X1", "qty": 2 },
    { "sku": "X2", "qty": 1 }
  ]
}

Unnesting Config:

{
 "ingestionConfig": {
   "complexTypeConfig": {
     "delimiter": "_",
      "fieldsToUnnest": ["items"],
      "collectionNotUnnestedToJson": "NON_PRIMITIVE"
   }
 }
}

Resulting Schema:

orderIditems_skuitems_qty
A001X12
A001X21