Tightly-coupled real-time analytics databases
Apache Pinot, like other typical OLAP real-time databses, was a strictly tightly coupled system for many years. Which meant, that we could only use disk/SSDs as storage, from local instance storage or remote attached storage (EBS volumes). This works great for such systems, as these types of storage have very fast access speeds (microseconds for local, milliseconds for remote attached storage), and help them achieve fast milliseconds/sub-seconds query latencies.Exploding cost with data volume increase
As Pinot became popular and adoption increased, users wanted to put more and more data into Pinot, and use it for use-cases beyond real-time user-facing analytics, such as internal dashboarding, metrics reporting, and so on. But using disks/SSDs for storing this vast amount of data, becomes very expensive. There’s 2 main reasons for this:- Since this storage is tightly coupled with the compute, you’ll often have to add a lot of compute just to keep up with increasing storage, whether you need to utilize the compute or not.
- Secondly, compared to storage options like cloud object stores, disk/SSD cost almost 5x