Spark Associate Developer Certification Guide

#spark #certification #databricks #developer

This content is all about what is needed to pass the Databricks: Spark Associate Developer exam.

Books

Spark: The Definitive Guide
Learning Spark: 2nd Edition
The Data Engineering's Guide to Apache Spark

Lectures

Topics touched on the exam

When does a Spark application fail? (when executor fails, when driver fails, when data is not fully cached, etc.)
What is the most granular unit in the Spark hierarchy? (jobs, stages, tasks, etc.)
What does NOT help in optimizing a Spark application? (related to partitions, column merging, etc.)
What happens if there are more slots than tasks to process in a worker node? (resources are not fully utilized, etc.)
What is a task? (a unit of work that can fit into an executor, a unit of work that can fit into a machine, etc.)
What is a job?
What is the difference between actions and transformations?
Which one of Dataset API methods is most likely to invoke a shuffle? (union, groupBy, filter, etc.)
How many % of the following code will cache the dataframe? (a .show() is called on a Scala range)
How many jobs will the following code create? (a dataframe reading and schema infering)
A wide partitions exchanges data between which units? (partitions, executors, clusters, etc.)
We want to generate 25 partitions after a join, what is the right configuration to use?
What are valid Spark deployment modes? (YARN, Local, Standalone, etc.)
Which of the options helps garbage collecting? (increasing java heap space, serialization or deserialization, etc.)
Dataset API Questions
Split function
Explode function
Joins (inner, left, crossJoin and anti)
Renaming column
Overwriting column
Filtering with multiple conditions
Using where vs using filter difference
Date and time manipulation (to and from unix, formatting, etc.)
Sorting asc and desc with and without nulls
Literals
Repartition and coalesce (more than 2 questions)
UDFs
Aggregate functions (dense rank and rank)
Printing schema
Finding transformations and actions
Collecting a dataset, extracting values and casting
Casting columns of a dataset
Dataset Reading and Writing
Reading a raw CSV file
Reading a CSV file with schema and with separators
Read and write modes
Writing and overwriting a parquet
Partitioning by a column and writing

Do not rely on documentation online!

Top comments (1)

James Taylor • Oct 2

Great post! Thanks for sharing such detailed resources for the Databricks Spark Associate Developer exam. It's really helpful to see a breakdown of books, lectures, and exam topics—especially the insights on Spark application failures and optimization strategies. For anyone preparing for this certification, CertsLab also offers comprehensive and updated practice exams to enhance your study experience. We've got you covered with realistic exam simulations and essential topics to help you pass on the first attempt! Keep up the amazing work! 🙌

Check it out: certslab.com/databricks/databricks...

DEV Community

Spark Associate Developer Certification Guide

Books

Lectures

Youtube

Udemy

Exams

PDF Exams

Topics touched on the exam

Top comments (1)

Read next

5 Tips for Java Certification Preparation

Dynamic Allocation Issues On Spark 2.4.8 (Possible Issue with External Shuffle Service?)

Upgrading to Windows 11 Version 24H2: A Developer’s Perspective on Sudo

How to Stay Updated in Tech Without Information Overload