Discussion on: Live notetaking as I learn Spark

View post

This is great! I have been working with Apache Spark for years and your notes are great and to the point!

I love that you compared cluster computing vs grid computing. As to my opinion, I think that Apache Spark can also perform as a grid computing, however, having said that, I don't think it will get the best performance since the physical distance matters.

If you look at grid as a distributed system concept - a way to use computers distributed over a network to solve a problem, then Hadoop is a subset of grid computing. And Apache Spark is like the better version of Hadoop Map Reduce paradigm since it does most of the calculations in memory.
The problems for which Apache Spark is most often used are problems which a better solved when a computation is brought to where the data lives. Typical "Big Data" problems like machine learning, data mining, etc feel into this category. I personally never tried using Apache Spark as grid computing but it can be an interesting test.

What are you planning next with Apache Spark ?