DEV Community

Labinot Vila
Labinot Vila

Posted on

Spark Associate Developer Certification Guide

This content is all about what is needed to pass the Databricks: Spark Associate Developer exam.

Books

Spark: The Definitive Guide
Learning Spark: 2nd Edition
The Data Engineering's Guide to Apache Spark

Lectures

Youtube

Advanced Apache Spark Training - Sameer Farooqui (Databricks)
Apache Spark Core—Deep Dive

Udemy

Apache Spark 3 - Beyond Basics
Apache Spark 3 - Databricks Certified

Exams

Databricks Apache Spark 3.0 Dev Certification - Tests(Scala)
Databricks Certified Apache Spark 3.0 TESTS (Scala & Python)
Databricks Certified Developer for Spark 3.0 Practice Exams

PDF Exams

Databricks Certified Developer for Spark 3.0 Practice Exams
PDF Exams

More Demo Dumps

Topics touched on the exam

  1. When does a Spark application fail? (when executor fails, when driver fails, when data is not fully cached, etc.)
  2. What is the most granular unit in the Spark hierarchy? (jobs, stages, tasks, etc.)
  3. What does NOT help in optimizing a Spark application? (related to partitions, column merging, etc.)
  4. What happens if there are more slots than tasks to process in a worker node? (resources are not fully utilized, etc.)
  5. What is a task? (a unit of work that can fit into an executor, a unit of work that can fit into a machine, etc.)
  6. What is a job?
  7. What is the difference between actions and transformations?
  8. Which one of Dataset API methods is most likely to invoke a shuffle? (union, groupBy, filter, etc.)
  9. How many % of the following code will cache the dataframe? (a .show() is called on a Scala range)
  10. How many jobs will the following code create? (a dataframe reading and schema infering)
  11. A wide partitions exchanges data between which units? (partitions, executors, clusters, etc.)
  12. We want to generate 25 partitions after a join, what is the right configuration to use?
  13. What are valid Spark deployment modes? (YARN, Local, Standalone, etc.)
  14. Which of the options helps garbage collecting? (increasing java heap space, serialization or deserialization, etc.)
  15. Dataset API Questions
  16. Split function
  17. Explode function
  18. Joins (inner, left, crossJoin and anti)
  19. Renaming column
  20. Overwriting column
  21. Filtering with multiple conditions
  22. Using where vs using filter difference
  23. Date and time manipulation (to and from unix, formatting, etc.)
  24. Sorting asc and desc with and without nulls
  25. Literals
  26. Repartition and coalesce (more than 2 questions)
  27. UDFs
  28. Aggregate functions (dense rank and rank)
  29. Printing schema
  30. Finding transformations and actions
  31. Collecting a dataset, extracting values and casting
  32. Casting columns of a dataset
  33. Dataset Reading and Writing
  34. Reading a raw CSV file
  35. Reading a CSV file with schema and with separators
  36. Read and write modes
  37. Writing and overwriting a parquet
  38. Partitioning by a column and writing

Do not rely on documentation online!

Top comments (0)