Batch Ingestion
Batch Ingestion in Apache Pinot with Minions
Batch ingestion in Apache Pinot is the process of importing large volumes of static or historical data from sources like cloud storage (S3, GCS, ADLS), distributed file systems (HDFS), or SQL databases(Snowflake, BigQuery) into Pinot’s offline tables. These tables are optimized for high-performance analytical queries. Traditionally in Apache Pinot, batch ingestion relied on external orchestration frameworks such as Apache Spark or Hadoop. However, with the introduction of Pinot Minions, this process can now be handled natively within the Pinot cluster itself. Minions allow ingestion workflows and post-processing tasks to be automated, distributed, and managed internally without requiring external job schedulers.

