As an engineer with several years of experience in Backend and Frontend projects it feels like the next natural step is big data challenges.
In the big data world I expect to find computing, IO and scaling challenges not usually found in ordinary/plain/textbook architectures.
I decided that Spark is the best way to get started. Specifically - the Databricks certification, which is focused on Spark programming and architecture. I believe the HDFS/Ops technology are irrelevant for me today because all of the managed services on AWS etc.
My game plan to pass the Databricks spark certification is to:
- Read "Learning Spark Lightning fast big data analysis" book and work through all the examples + summarising important insights and lessons so I can repeat those later.
- Go over the skeletons of Databricks Developer course that I found on GitHub from 15 months ago. Should be pretty updated - https://github.com/vivek-bombatkar/spark-training + https://github.com/vivek-bombatkar/Spark-with-Python---My-learning-notes-
- Going through example questions.
Please, If you can advice on any source of preparation - write in the comments it will help me.
I will update as I go for others (and myself).
Reading throughly the book "Learning Spark Lightning-fast..."
I think it's reasonable to go through 2 chapters per week.
this means: reading, summarizing and running important code snippets on my own.
Chapter 11 - Quick read it's not that important
Advanced topics (10 notebooks)
Spark execution(1 notebooks)
total 26 notebooks
I hope to do 3-4 notebooks per week (some will be easy some harder, so taking the average). This will result in 8 weeks of going through the notebooks. Learning what I'm missing etc.
Everything should take 3 months until I'm ready for the exam.
Learning Spark: Lightning-Fast Big Data Analysis
Spark: The Definitive Guide: Big Data Processing Made Simple
Spark in Action