Skip to main content
When you onboard a Parquet-based data source, whether a raw S3 data lake or an Iceberg-managed table, Pinot automatically maps Parquet column types to Pinot data types. This page documents those mappings, known limitations, and important behavioral notes.

Primitive Types

Parquet TypePinot TypeNotes
BOOLEANBOOLEANOptional and required variants both supported.
INT32INT
INT64LONG
FLOATFLOAT
DOUBLEDOUBLE
BINARYSTRINGRaw binary data is readable as a string.
FIXED_LEN_BYTE_ARRAYSTRINGFixed-width byte arrays are stored as strings.
INT96TIMESTAMPDeprecated Parquet type. Stored as nanoseconds since epoch; no column statistics available. Avoid using as a time column.

Logical Types

Integer Subtypes

Parquet Logical TypePinot TypeNotes
INT(8, signed)INTRange: -128 to 127
INT(16, signed)INTRange: -32,768 to 32,767
INT(32, signed)INT
INT(64, signed)LONG
INT(8, unsigned)INTRange: 0 to 255
INT(16, unsigned)INTRange: 0 to 65,535
INT(32, unsigned)LONGKnown limitation: values are stored as signed. Values above 2,147,483,647 may be corrupted. See Unsigned Integer Limitation.
INT(64, unsigned)LONGKnown limitation: same signed-storage issue. Values above Long.MAX_VALUE are not preserved accurately.

String and Binary Types

Parquet Logical TypePinot TypeNotes
STRINGSTRINGUTF-8 encoded strings.
ENUMSTRINGEnum values stored as strings.
JSONSTRINGValid JSON strings preserved as-is.
BSONBYTESBinary data; requires base64 encoding when querying.
UUIDSTRINGKnown limitation: stored as raw bytes in some ingestion paths, which can cause serialization errors. Verify UUID columns render correctly after ingestion.

Temporal Types

Parquet Logical TypePinot TypeNotes
DATELONGStored as epoch days.
TIME(MILLIS)INTMilliseconds since midnight; range 0 to 86,399,999.
TIME(MICROS)LONGMicroseconds since midnight; range 0 to 86,399,999,999.
TIME(NANOS)LONGNanoseconds since midnight.
TIMESTAMP(MILLIS, UTC)TIMESTAMP
TIMESTAMP(MILLIS, local)TIMESTAMP
TIMESTAMP(MICROS, UTC)TIMESTAMPSub-millisecond precision is lost — Pinot stores timestamps in milliseconds internally.
TIMESTAMP(MICROS, local)TIMESTAMPSub-millisecond precision is lost.
TIMESTAMP(NANOS, UTC)TIMESTAMPSub-millisecond precision is lost.
TIMESTAMP(NANOS, local)TIMESTAMPSub-millisecond precision is lost.

Decimal Types

All Parquet DECIMAL variants map to Pinot BIG_DECIMAL, preserving scale and precision.
Parquet Physical + Logical TypePinot TypeExample
INT32 + DECIMAL(p, s)BIG_DECIMALDECIMAL(9, 2)
INT64 + DECIMAL(p, s)BIG_DECIMALDECIMAL(18, 4)
FIXED_LEN_BYTE_ARRAY + DECIMAL(p, s)BIG_DECIMALDECIMAL(28, 6)
BINARY + DECIMAL(p, s)BIG_DECIMALDECIMAL(38, 10)

Complex Types

All complex Parquet types — STRUCT, LIST, and MAP — are stored as JSON strings in Pinot using the ParquetToPinotTypeMapper. You can query them using JSON_EXTRACT_SCALAR and JSON_MATCH, provided a JSON index is configured on the column.
Parquet TypePinot TypeNotes
STRUCT{...}JSONFlat and nested structs both supported.
LIST<STRING>JSONMulti value column of strings.
LIST<INT32>JSONMulti value column of integers.
LIST<STRUCT>JSONJSON array of objects.
MAP<STRING, STRING>JSONJSON object with string keys and values.
MAP<STRING, INT>JSONJSON object with string keys and integer values. Null handling issues may occur — see Null Handling in Complex Types.
MAP<STRING, STRUCT>JSONJSON object with nested struct values. Same null handling caveat applies.
STRUCT{..., LIST<...>}JSONNested arrays within structs are supported.
STRUCT{..., MAP<...>}JSONNested maps within structs are supported.
To use JSON_MATCH or JSON_EXTRACT_SCALAR on complex columns, you must enable a JSON index on those columns. Without it, queries will fail with a Cannot apply JSON_MATCH on column without json index error. Configure the index in the table’s index settings.

Known Limitations

Unsigned Integer Types

Parquet UINT32 and UINT64 (unsigned 32-bit and 64-bit integers) are stored in Pinot as a signed LONG. This means:
  • Values that fit within the signed range are accurate.
  • Values above 2,147,483,647 (for UINT32) are stored as negative numbers or wrap around, silently corrupting approximately half the data.
If your dataset contains unsigned integers larger than INT32_MAX, treat the Pinot LONG values with caution or apply a transformation at ingestion time.

Timestamp Precision Loss

Pinot stores timestamps internally in milliseconds. Parquet columns with microsecond (MICROS) or nanosecond (NANOS) precision will have their sub-millisecond values truncated silently on ingestion. If sub-millisecond precision is required, store the raw value in a separate LONG column alongside the TIMESTAMP column.

Null Handling in Complex Types

MAP columns that contain null values in keys or values may not propagate null bitmaps correctly to Pinot segments. This can result in nulls being replaced by type defaults (e.g., 0 for integers, empty string for strings). Enable nullHandlingEnabled: true in your table configuration and verify null behavior after ingestion.

BYTES Columns in Queries

Pinot BYTES columns (used for BSON logical type) are base64-encoded when returned in query results. Account for this when reading binary data from queries.

INT96 (Deprecated)

INT96 has no column statistics in Parquet, which affects min/max pruning during segment generation. It should not be used as a timeColumnName.