Hi I'm David and I recently obtained my Google Cloud Professional Data Engineer certification! This post will detail my journey into obtaining this certification which I hope others may find useful on their journey to passing this exam.
I have been working in the world of data/tech for about 3 and a half years. I first started my journey on a Data Analytics Apprenticeship scheme and as such have working experience in many aspects of data including Analytics, Engineering, Data Science and Visualisation.
Prior to studying for this certification, I already had self-studied and passed the Associate Cloud Engineer Certification. Currently, I do not use Google Cloud for my day job which means I passed the exam with limited hands-on experience, instead having to rely on my 18 months experience working as a Data Engineer and knowledge of GCP obtained from the ACE.
If you are in a similar position to me and are interested in learning Google Cloud, I highly suggest following a similar path of passing the ACE first before looking to take any of the professional certifications. This will ensure you have a solid foundational knowledge of GCP and its product offerings that you can then expand upon and specialise in by choosing the relevant professional certification.
Within Pluralsight, there is a certification section that contains a Google Cloud Professional Data Engineer track. This is a series of videos that cover the syllabus for the exam, a similar learning path is available on Coursera. Interspaced within these videos, is Qwiklabs (more on these later) to give hands on experience with using GCP.
Truthfully, while the videos cover most of the syllabus I find this style of learning didn't suit me as I found it hard to engage and follow along with the videos. As a tip to make it a little bit more easier to focus, on a lot of the video sections if you scroll through to the end you can download and print out the slides. I found doing this and annotating them as I watched the videos helped me focus my attention and aided my learning.
It is also worth noting that when I originally went through this learning track (September 2021), the videos appeared to be a couple years out of date, meaning aspects would have to be cross-referenced with other resources to verify the latest aspect of Google Cloud. However, a colleague of mine did these videos a few months after me and noted that it seemed like some of the content had been updated since I went through it.
Qwikalbs (or Google Cloud Skill Boost) is an interactive way of gaining experience with GCP. It consists of labs, that provide you with a Google Cloud logon to an environment that contains predefined resources. You then follow a series of instructions that will teach you how to achieve the objective of the lab, e.g. how to spin up a Kubernetes Cluster. As you go through the lab, it tests for a series of checkpoints to make sure you have successfully followed the instructions.
This was great for getting some hands on experience and getting a feel for the platform. The labs often include instructions for completing it using the UI or the Cloud Shell, and to expand your knowledge I'd highly recommend re-doing labs using both methods.
Unfortunately, the downside is that the instructions at times often hold your hand too much, giving you the exact steps and Cloud Shell commands needed to complete the lab rather than giving you a chance to think. This puts the onus on the learner to make sure they are really following along and engaging with the instructions, and more specifically why they are doing that step. Several labs can be completed without any thought by just copy and pasting the commands into Cloud Shell, but then you learn absolutely nothing. I highly recommend when doing these labs to take extra time, fully read the instructions and put some thought into the logic behind the step.
It is worth noting that several of the lab pathways will end in a challenge lab. This lab give you the bare minimum requirements, usually 4-5, and then you're on your own to deliver the requirements. This is a great change of pace as the lack of step by step instructions makes you put into use the previous labs knowledge as you fulfil the requirements. These serve as a great way to actually test your understanding and implement a Cloud solution.
I found these useful to give me some hands on experience, however if you already use Google Cloud in your day jobs I suspect Qwiklabs won't provide any benefit to you.
I highly recommend grabbing a copy of the Official Study guide as this was definitely my best resource for prepping for the exam. The book comprises of 12 chapters that cover all aspects of the Data Engineer syllabus. Additionally, it comes with several free online resources that will help you prepare
- 2 custom exam quizes
- all chapter quizzes online
- flash cards
I ended up reading the book through twice cover to cover. The first time I read it and underlined in pencil the key parts of every chapter. My second time through I wrote down the key summary notes that would form the basis of being able to revise without reading the book again. I'm in the process of typing and uploading my revision notes and they can be found on my GitHub page.
Each chapter has between 10 and 15 questions to test your understanding of the topic and to help give an indication on your weaknesses and areas to focus revision on. This proved extremely useful in identifying what I needed to spend more time revising ahead of the exam, and due to them being available online allow for repetition. Before sitting the exam I was able to answer every question successfully and understand why that was the answer.
Despite the usefulness of the book, I must mention a couple of caveats. Firstly, due to the nature of cloud technology, aspects of the book were out of date. As such, it is worth supplementing with other resources and/or cross referencing with Google's documentation. Secondly, there are some typos in the book that result in incorrect information being given. For example, when discussing Cloud Spanner regional replication, the book mentions it only uses one type of replication. It initially states this is read-only but a few lines later states it is read-write... I had to double check which type it actually used by looking it up in the Google documentation. There was a few instances of this so if something doesn't seem quite right, a quick Google to confirm will put you on the right track. Finally, while the questions are great for making sure you understand the topics, they are not indicative of the types of question you get in the exam. The book questions are very one dimensional whereas in the exam you need to consider how to apply the different cloud offerings in a professional environment. Which moves us nicely onto our next section
Google provide sample questions to give you a taste of what the exam will be like. Go over these several times to get a grasp of the kind of questions you could get asked and how they differ from the Study Guide questions. (Also worth noting that on my exam I got a question almost identical to one of the sample questions, giving me a 'free' answer in the exam!)
Details about the exam can be found on the [Professional Data Engineer Certification]Page(https://cloud.google.com/certification/data-engineer). This will contain information on the exam exam guide (syllabus), training opportunity, a link to the sample question and how to book the exam. Now that you have studied for the exam, take a look at the exam guide and go through every section. You should be familiar with everything mentioned in the guide, and if not then you've identified a gap in your knowledge to go back and study.
Once you feel confident that you have an understanding of everything on the guide, you are ready to book your exam. Exams can be either taking remotely or in an onsite testing facility. If you do not have a quiet, isolated room you can shut yourself away in for a couple of hours then opt for the onsite test. I sate mine remotely, and have some tips regarding the process below
Since this was my second remote proctored exam I was familiar with the process and set-up so I ensured my test environment was prepared the night before. Aside from the obvious (sitting the exam in an isolated, quiet room), I'd recommend doing the following before sitting the exam:
- Ensure your webcam has a long cable. Before the exam begins, the examiner will ask for a thorough examination of your room. This involves pointing your webcam to pretty much every area of your room (walls, ceilings, floors, back of monitor, etc). This is much easier to do if you have an external Webcam with a long USB cord
- Download and install the latest version of the sentinel software.Instructions can be found here
- Perform a system check to verify your setup is working and ready to sit the exam.
- Set-up your biometrics profile. On the day this will be used to verify your identity and you don't want the added stress of having to set this up on the day. Especially since I found the process quite fiddly in getting a photograph that the processes accepted as valid. Details on how to do this can be found here
- Remove all objects on your desk and surrounding area. This includes second monitors, printers, speakers, paper, notes, water bottle etc you name it, it should not near your desk. On the day of the exam, the examiner will check your desk complies with their rules and I found this process can be a little bit pedantic. For example I had a printer and a shredder about 1.5m away from me on a different desk. I assumed this would be fine but on the day of my exam they asked for me to remove them from the room. On my second test, they queried my mouse bungee. Be on the safe side and avoid becoming flustered/delaying sitting your exam by ensuring on your desk you have a single monitor, mouse and keyboard, everything else should be removed.
- Ensure you have your photographic ID to hand and your phone on your desk. The ID is used to verify your identity and your phone will be used to take pictures of the cables hanging out the back of your monitor. The examiner will then ask you to record putting these away in a spot away from your desk (again, this is easier if you have a Webcam with a long cable).
The exam follows the same structure as the other certifications, 50 questions over two hours. The questions are pretty wordy, often comprised of 6+ sentences for you to dissect. Despite this, I didn't find time to be a pressure, finishing my first pass through of the exam in around 45 mins. The second pass through I used to review any questions I had flagged (at a guess I'd say I flagged 10-12), this took about 20 mins. Finally, I did one final pass through of all questions, thoroughly rereading all questions and double checking my answers, this took around 25-30 mins. Meaning I finished the exam in just over an hour and a half. As such, don't rush through the questions as you will have plenty of time at the end.
If you have done the practice questions, you will already be familiar with the style and what to expect. The questions are heavily focused on Google best practices and how to best implement the various Cloud Data Engineering components. A lot of the questions will make you think about the optimal way to link two or more of the products together. As long as you have a solid understanding of each offering, their usages and their limitations, it isn't too difficult to piece together how they can be combined together to create a Data Engineering pipeline.
Several of the questions in my exam asked you to pick 2-3 answers from 5 options. These were typically the trickiest questions on the exam but there are some patterns to them. For example, several of the questions had options A and B being pretty similar but with one of them being slightly incorrect, as long as you spot the wrong option, it immediately eliminates the other option for contention and leaves you to focus on determining which is the correct answer from the remaining option.
Finally, in both my ACE and PDE examinations, I never received a single question on Cloud Shell commands that required me to select the right option. I did get a single question on which flag should be passed to a cloud shell command. As such, I wouldn't recommend spending a great deal of time focusing on learning all the intricacies of Cloud Shell commands, I would advise learning the basic cloud shell commands verb syntax order for how to spin up each of the product offerings and a few of the more important flags in case you they come up for you. These commands are all covered in detail in the book and the only extra exposure I had to them was doing some of the Qwiklabs.
Overall, I feel if anyone followed the exam preparation I performed they would be in great shape to pass the exam. Ideally, having some hands-on experience using Google Cloud in your day job would increase your likelihood of passing, but isn't essential.
That being said, from my experience with the exam there was three areas that came up that I hadn't come across in my studying (or if I had, they came up in such brief detail I didn't remember seeing them):
- BigQuery materialised views - This is a question that I feel I was unprepared for due to having a lack of hands on experience... I got two questions on this (and thankfully the second question gave away the answer to the first). Both of the questions were concerning the best way to optimally expose BigQuery tables. The first question had a list of different types of views (with materialised being one of them, unfortunately I can't remember the rest). The second question had materialised views as a possible answer for choosing three best practice for utilising BigQuery.
- Firebase Cloud messaging - This come up as an answer in designing a messaging queue system. The options made it a shootout between Pub/Sub and Firebase Cloud Messaging. Part of the question requirement was being able to query the entire messaging history, I knew this couldn't be done in Pub/Sub so opted for Firebase Cloud Messaging
- Terraform - thankfully I was aware of the existence and utility of Terraform, an infrastructure as a code tool, but don't think it was covered in any of the material I studied for this exam. This came up as part of a best practice for managing environments question that required multiple answers. I doubt you'd get a detailed question on Terraform but I'd recommend brushing up on a high-level what it is and why it's used.
I'm sure this isn't a definitive list of topics that weren't covered in the subject material that have questions in the exam... However as long as your fundamentals for all of Google's Data Engineering and ML offerings are up to scratch, even if you're unfortunate enough to get a few questions like this that you haven't come across, you'll still be in good shape to have an educated guess and pass the exam.
Congratulations! You managed to pass your Professional certification 👏 Once Google have verified your results and emailed you the certification, if you look in the bottom of the email text you should see a code that will allow you to choose a piece of swag to receive. This can be redeem by navigating to the Google Merchandise store. As of March 2022, the following options were available to choose between:
- Cloud Data Engineer Laptop Backpack
- Data Engineer Fleece
- Data Engineer Sweater
- $55 donation to a charity
Regardless of what you choose, you will also receive the Professional Data Engineer Badge sticker
Thanks for taking the time to read my overview, and I hope at least one person will find it useful on their journey to become Cloud certified. If you have any questions concerning the exam, drop a comment and I'll happily reply 😊