Interviewing for any technical position generally requires preparing, studying, and long, all-day interviews.
Data engineering interviews, like other technical interviews, require plenty of preparation. There are a number of subjects that need to be covered in order to ensure you are ready for back-to-back questions.
We have gathered many of the resources that we have used to study and get jobs at companies in the FAANG family as well as other major tech companies. We have yet to find one that requires you to know anything about Hadoop during the interview, so that has not been included in this study guide.
We recommend asking the recruiter if you aren't sure which type of interview you will be facing. Some companies are very good at keeping interviews consistent, but even then, teams can deviate depending on what they are looking for. Here are some examples of what we have noticed about some companies' data engineering interviews.
Amazon --- SQL- and database-design heavy as well as general ETL design. Surprisingly, no Python.
Netflix ---SQL- and code-heavy, with the expectation that you can not only write SQL and code but can optimize them.
"They asked about SQL queries to find time difference between two events given certain condition. " --- Netflix data engineer on Glassdoor
Expedia --- Big Data questions, like what is Spark and RDDs, as well as SQL and Python.
Due to this variance, we've created a checklist to keep track of what subject areas you have already studied and what you still need to cover: data engineering study checklist.
Also, I recently created a video guide to walk through the data engineering interview study guide.
Let's get started with SQL.
As a data engineer, it is almost inevitable that you will get some SQL questions. As someone who has participated in many interviews for a lot of top tech companies, like Amazon and Capital One, I know that they usually follow some similar patterns.
Typically there will be at least one question that requires an aggregation with a filter, another that requires a few joins, and then one that requires a subquery. Along with that, there might be a few curveball questions that require self-joins, recursions, and analytic functions. So let's look at a couple of concepts that are good to cover.
These first few problems will help you gauge where you are on different concepts. That way you can take notes on the study guide, and go back and review what you feel you were not comfortable with.
- 262. Trips and Users
- 601. Human Traffic of Stadium
- 185. Department Top Three Salaries
- 197. Rising Temperature
- 626. Exchange Seats
- The Report
- 177. Nth Highest Salary
- Symmetric Pairs
- Ollivander's Inventory
Once you have finished watching the SQL videos above, consider trying the new problems below. Try to see if you feel like you are improving. Again, note down any specific topics you feel weak on.
- Binary Tree Nodes
- 595. Big Countries
- 626. Exchange Seats
- Weather Observation Station 18
- Print Prime Numbers
- SQL Interview Questions: 3 Tech Screening Exercises (For Data Analysts)
Before scrolling any further, why not join our team's newsletter to keep up to date on data science, data engineering and tech! Learn more here.
For database, ETL, and data warehouse design questions, we have gathered some books and videos we hope will help you out when it comes to explaining your design in an interview. In addition, we have listed a few plausible database/DW concepts you could attempt to design out on your own.
We recommend going through the videos and at least skimming the Data Warehouse Toolkit before attempting the self-practice problems.
The Data Warehouse Toolkitby Ralph Kimball
For this part of your interview practice, we are going to list a few business systems that you can try to design out. First, we recommend designing a relational database, then thinking about how you would design an ETL and DW that rely on that relational DB.
Note: In addition, we have found it very common that interviewers will base their interview questions on your design. So think about some of the questions you could answer with your DB and list them.
Design a database/ETL and DW for a:
- dating app
- bicycle rental service
- music streaming app
- job search website
- Udemy-like website
These are just a few ideas. We hope they help you have a clearer idea of what you can practice modeling and designing. Take some time to think about how users interact with these websites before getting started.
Data engineers do a significant amount of programming in their daily life. There are several specific languages data engineers use. Python is arguably the most common.
There are two types of questions we have experienced. Some interviewers will ask you more operational questions. Others will ask classic algorithm and data structure questions.
Operational interview questions are harder to prep for. There are no "classic" interview questions here. However, they are also often easier to figure out on the spot. Algorithm interview questions usually have some sort of trick. Like the balanced brackets problem: If you don't know you need to use queues, it will be very difficult to get to the correct answer.
Operational problems, however, will be more focused on workflows and business processes. So as long as you are good at walking through real problems, this should be easier. Here are some problems that are great for prepping. We find it is helpful to know how to use arrays and dictionaries. Beyond that, there isn't too much more required.
- Kangaroo problem
- Breaking records
- Find a string
- No idea!
- Days of the programmer
- Word order
- Sherlock and squares
- Equalize the array
- Apples and oranges
- More operational style questions
Before going too deep into data structure and algorithms, let's do a quick check to see how you are currently doing in this area. We have listed eight LeetCode problems that vary in difficulty. Try these out and try to gauge yourself on how long it takes you, as well as how many hints you needed. If you are following along with the study guide, then note this down. At the end of this list are a few more questions. So once you have watched all the videos, consider doing those problems, and see if you feel like you are improving!
- 985. Sum of even numbers after queries
- 657. Robot return to origin
- 961. N-repeated element in size 2N array
- 110. Balanced binary tree
- 3. Longest substring without repeating characters
- 19. Remove Nth node from end of list
- 23. Merge k sorted lists
- 31. Next permutation
Now that you have gone through these eight questions and shaken off the rust, let's start reviewing these concepts.
- Data Structures & Algorithms #1 --- What Are Data Structures?
- Data Structures: Linked Lists
- Data Structures: Trees
- Data Structures: Heaps
- Data Structures: Hash Tables
- Data Structures: Stacks and Queues
- Data Structures: Crash Course Computer Science #14
- Data Structures: Tries
- Python Algorithms for Interviews
- Algorithms: Graph Search, DFS, and BFS
- Algorithms: Binary Search
- Algorithms: Recursion
- Algorithms: Bubble Sort
- Algorithms: Merge Sort
- Algorithms: Quicksort
Once you have finished the videos above, consider trying the algorithm and data structure problems below. Make sure you keep track of how comfortable you felt when working on the problems.
- Bigger is greater
- 6. Zigzag conversion
- 7. Reverse integer
- 40. Combination sum II
- 43. Multiply strings
- Larry's array
- Short palindrome
- 65. Valid number
If you still feel like you need help, then consider taking a course on algorithms and data structures.
Back in 2020, I made a video about practicing for a data engineering interview. Funny thing was, a person commented about the video and pointed back to my original data engineering study guide. Just by happenstance!
They also added another section. In this case, they added Spark. So for those of you out there needing to study for Spark, here is what Paul Russel added to the checklist. What would you add?
We do hope this list will help you prepare for your next data engineering interview. Please let us know if you have any questions or need any future help.