A few weeks ago, my team lead and I started running into a problem with our ETL pipelines - they were too damn heavy! We were having problems with the RAM usage, many scripts with tables too large (50-100 GB) were crashing with memory errors.
We tried all kinds of things to lower the RAM usage and while it did help it was clear to us this was more of a bandaid then a solution. So we looked around and came across... Polars!
Polars is a DataFrame library for Rust and Python, written in Rust (which already got me excited 🤤) and built upon Apache Arrow2.
Most of my gripes with Pyspark was the hassle of setting it up (and my lack of experience with it), but polars looked intitutive to me as a long time pandas user, and the benchmarks they were showing off were very promising:
(Feel free to see more at https://h2oai.github.io/db-benchmark/)
We have been gettting results so far, but still learning the nuinances of Polars, so let me know if you want to hear more about it!
If you're looking for a fast DataFrame library that you've never heard of, look no further than Polars! This library is designed to be fast and efficient, and it definitely delivers on that front. In addition, Polars is also very easy to use, so you'll be up and running in no time. Whether you're a beginner or an experienced user, you'll definitely find Polars to be a valuable addition to your toolkit. So what are you waiting for? Give Polars a try today!