In this recipe we’ll learn how to flatten nested objects when ingesting JSON documents into Apache Pinot.Documentation Index
Fetch the complete documentation index at: https://docs.startree.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Pinot Version | 0.10.0 |
|---|---|
| Code | startreedata/pinot-recipes/json-flatten |
Prerequisites
To follow the code examples in this guide, you must install Docker locally and download recipes.Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
Launch Pinot Cluster
You can spin up a Pinot Cluster by running the following command:Dataset
We’re going to import the following JSON file:address and company that we’re going to flatten so that we can import all the fields into Pinot.
Pinot Schema
Working out all the column names and their types is tedious with complex JSON documents, but luckily Pinot provides a tool that infers the schema for us. We’ll call it by running the following command:/data/schema/users.json, looks like this:
address_street, address_geo_lat, company_name.
The
. character is used as the default delimiter, but we have overriden that by passing in the -delimiter parameter.
We’ll need to make sure that we pass in this delimiter to our table config later on in this guide.schemaName to be users_flatten in one and users_no_flatten in the other.
Pinot Table
Now let’s create Pinot tables that uses the above schemas. We’re going to create two different tables so that we can see what happens if we don’t specify the config that flattens JSON documents. First, the table config that doesn’t do any flattening:delimiter if our schema doesn’t use the . character to separate nested field names.
You can create the tables by running the following commands:
Ingestion Job
Now we’re going to import the JSON file into Pinot. We’ll do this with the following ingestion spec:users schema.
You can run the following command to run the import:
Querying
Once that’s completed, navigate to localhost:9000/#/query and we’ll run some queries see how the JSON flattening works. First, let’s queryusers_no_flatten:
| id | name | username | company_name | company_bs | address_street | address_geo_lat | address_geo_lng |
|---|---|---|---|---|---|---|---|
| 1 | Leanne Graham | Bret | null | null | null | null | null |
| 2 | Ervin Howell | Antonette | null | null | null | null | null |
| 3 | Clementine Bauch | Samantha | null | null | null | null | null |
| 4 | Patricia Lebsack | Karianne | null | null | null | null | null |
| 5 | Chelsey Dietrich | Kamren | null | null | null | null | null |
users_flatten table:
| id | name | username | company_name | company_bs | address_street | address_geo_lat | address_geo_lng |
|---|---|---|---|---|---|---|---|
| 1 | Leanne Graham | Bret | Romaguera-Crona | harness real-time e-markets | Kulas Light | -37.3159 | 81.1496 |
| 2 | Ervin Howell | Antonette | Deckow-Crist | synergize scalable supply-chains | Victor Plains | -43.9509 | -34.4618 |
| 3 | Clementine Bauch | Samantha | Romaguera-Jacobson | e-enable strategic applications | Douglas Extension | -68.6102 | -47.0653 |
| 4 | Patricia Lebsack | Karianne | Robel-Corkery | transition cutting-edge web services | Hoeger Mall | 29.4572 | -164.2990 |
| 5 | Chelsey Dietrich | Kamren | Keebler LLC | revolutionize end-to-end systems | Skiles Walks | -31.8129 | 62.5342 |

