Do you wonder how a multidimensional index would look like in Spark?
Recently we launched the Qbeast Open Source Format, a Data Lakehouse enhancement to speed up your queries!
Based on Delta Lake and available for Apache Spark, it allows indexing your data by different columns and read a representative sample directly from storage 🔥
Quick example of how you can boost your query performance using Qbeast:
This is a Normal Query with Spark and Delta format.
This is the same query but with Qbeast Sampling of 1%
The gifs are cool, right? Let's compare both executions:
Format | Execution Time | Result |
---|---|---|
Delta | ~ 2.5 min. | 37.869383 |
Qbeast | ~ 6.6 sec. | 37.856333 |
As you can see, 1% sampling provides the result x22 times faster compared to using Delta format, with an error of 0,034%.
If you want to play with it, check out the Qbeast-Spark github
And don't forget to give us a star!
Your support means a lot ❤️
Top comments (0)