When you onboard a Parquet-based data source, whether a raw S3 data lake or an Iceberg-managed table, Pinot automatically maps Parquet column types to Pinot data types. This page documents those mappings, known limitations, and important behavioral notes.
Primitive Types
| Parquet Type | Pinot Type | Notes |
|---|
BOOLEAN | BOOLEAN | Optional and required variants both supported. |
INT32 | INT | |
INT64 | LONG | |
FLOAT | FLOAT | |
DOUBLE | DOUBLE | |
BINARY | STRING | Raw binary data is readable as a string. |
FIXED_LEN_BYTE_ARRAY | STRING | Fixed-width byte arrays are stored as strings. |
INT96 | TIMESTAMP | Deprecated Parquet type. Stored as nanoseconds since epoch; no column statistics available. Avoid using as a time column. |
Logical Types
Integer Subtypes
| Parquet Logical Type | Pinot Type | Notes |
|---|
INT(8, signed) | INT | Range: -128 to 127 |
INT(16, signed) | INT | Range: -32,768 to 32,767 |
INT(32, signed) | INT | |
INT(64, signed) | LONG | |
INT(8, unsigned) | INT | Range: 0 to 255 |
INT(16, unsigned) | INT | Range: 0 to 65,535 |
INT(32, unsigned) | LONG | Known limitation: values are stored as signed. Values above 2,147,483,647 may be corrupted. See Unsigned Integer Limitation. |
INT(64, unsigned) | LONG | Known limitation: same signed-storage issue. Values above Long.MAX_VALUE are not preserved accurately. |
String and Binary Types
| Parquet Logical Type | Pinot Type | Notes |
|---|
STRING | STRING | UTF-8 encoded strings. |
ENUM | STRING | Enum values stored as strings. |
JSON | STRING | Valid JSON strings preserved as-is. |
BSON | BYTES | Binary data; requires base64 encoding when querying. |
UUID | STRING | Known limitation: stored as raw bytes in some ingestion paths, which can cause serialization errors. Verify UUID columns render correctly after ingestion. |
Temporal Types
| Parquet Logical Type | Pinot Type | Notes |
|---|
DATE | LONG | Stored as epoch days. |
TIME(MILLIS) | INT | Milliseconds since midnight; range 0 to 86,399,999. |
TIME(MICROS) | LONG | Microseconds since midnight; range 0 to 86,399,999,999. |
TIME(NANOS) | LONG | Nanoseconds since midnight. |
TIMESTAMP(MILLIS, UTC) | TIMESTAMP | |
TIMESTAMP(MILLIS, local) | TIMESTAMP | |
TIMESTAMP(MICROS, UTC) | TIMESTAMP | Sub-millisecond precision is lost — Pinot stores timestamps in milliseconds internally. |
TIMESTAMP(MICROS, local) | TIMESTAMP | Sub-millisecond precision is lost. |
TIMESTAMP(NANOS, UTC) | TIMESTAMP | Sub-millisecond precision is lost. |
TIMESTAMP(NANOS, local) | TIMESTAMP | Sub-millisecond precision is lost. |
Decimal Types
All Parquet DECIMAL variants map to Pinot BIG_DECIMAL, preserving scale and precision.
| Parquet Physical + Logical Type | Pinot Type | Example |
|---|
INT32 + DECIMAL(p, s) | BIG_DECIMAL | DECIMAL(9, 2) |
INT64 + DECIMAL(p, s) | BIG_DECIMAL | DECIMAL(18, 4) |
FIXED_LEN_BYTE_ARRAY + DECIMAL(p, s) | BIG_DECIMAL | DECIMAL(28, 6) |
BINARY + DECIMAL(p, s) | BIG_DECIMAL | DECIMAL(38, 10) |
Complex Types
All complex Parquet types — STRUCT, LIST, and MAP — are stored as JSON strings in Pinot using the ParquetToPinotTypeMapper. You can query them using JSON_EXTRACT_SCALAR and JSON_MATCH, provided a JSON index is configured on the column.
| Parquet Type | Pinot Type | Notes |
|---|
STRUCT{...} | JSON | Flat and nested structs both supported. |
LIST<STRING> | JSON | Multi value column of strings. |
LIST<INT32> | JSON | Multi value column of integers. |
LIST<STRUCT> | JSON | JSON array of objects. |
MAP<STRING, STRING> | JSON | JSON object with string keys and values. |
MAP<STRING, INT> | JSON | JSON object with string keys and integer values. Null handling issues may occur — see Null Handling in Complex Types. |
MAP<STRING, STRUCT> | JSON | JSON object with nested struct values. Same null handling caveat applies. |
STRUCT{..., LIST<...>} | JSON | Nested arrays within structs are supported. |
STRUCT{..., MAP<...>} | JSON | Nested maps within structs are supported. |
To use JSON_MATCH or JSON_EXTRACT_SCALAR on complex columns, you must enable a JSON index on those columns. Without it, queries will fail with a Cannot apply JSON_MATCH on column without json index error. Configure the index in the table’s index settings.
Known Limitations
Unsigned Integer Types
Parquet UINT32 and UINT64 (unsigned 32-bit and 64-bit integers) are stored in Pinot as a signed LONG. This means:
- Values that fit within the signed range are accurate.
- Values above
2,147,483,647 (for UINT32) are stored as negative numbers or wrap around, silently corrupting approximately half the data.
If your dataset contains unsigned integers larger than INT32_MAX, treat the Pinot LONG values with caution or apply a transformation at ingestion time.
Timestamp Precision Loss
Pinot stores timestamps internally in milliseconds. Parquet columns with microsecond (MICROS) or nanosecond (NANOS) precision will have their sub-millisecond values truncated silently on ingestion. If sub-millisecond precision is required, store the raw value in a separate LONG column alongside the TIMESTAMP column.
Null Handling in Complex Types
MAP columns that contain null values in keys or values may not propagate null bitmaps correctly to Pinot segments. This can result in nulls being replaced by type defaults (e.g., 0 for integers, empty string for strings). Enable nullHandlingEnabled: true in your table configuration and verify null behavior after ingestion.
BYTES Columns in Queries
Pinot BYTES columns (used for BSON logical type) are base64-encoded when returned in query results. Account for this when reading binary data from queries.
INT96 (Deprecated)
INT96 has no column statistics in Parquet, which affects min/max pruning during segment generation. It should not be used as a timeColumnName.