DEV Community

James McPherson
James McPherson

Posted on

How I started learning Apache Spark

I've realised over the years that the best way for me to start learning a new language, toolkit or technology is to dive right in and start trying to solve problems with it.

This is most definitely true for Apache Spark, which I had to do recently in order to prepare for a #DataScience interview.

I wrote a utility to Extract information from my 6+ years of PV Inverter data, Transform it and Load it (#ETL) into #DataFrames which I query for record dates, minimum and maximum output as well as daily average output. Keeping with my standard practice, I've put that code on GitHub, and written a blog post about the process. See more (much more!) at https://www.jmcpdotcom.com/blog/posts/2019-10-11-apache-spark-init/

Apache #Spark, #ETL, #Python

Top comments (0)