Python Screening Interview questions for DataScientists
DataScience requires an interdisciplinary set of skills, from handling databases, to running statistical model, to setting up business cases and programming itself. More often than not technical interviews for data-scientists assess more the knowledge of specific data manipulation APIs such as pandas, sklearn or spark, rather than a programming way of thinking.
While I think that a knowledge of the more “applied” APIs is something that should be tested when hiring data-scientist, so is the knowledge of more traditional programming.
String Reversal & Palindrome
String reversal questions can provide some information as to how well, certain candidates have been with dealing with text in python and at handling basic operations.
Question 1 :
Question: Reverse the String “ “the fox jumps over the lazy dog”
Answer:
a = "the fox jumps over the lazy dog"
a[::-1]
or ''.join(x for x in reversed(a)) [less efficient]
or ''.join(a[-x] for x in range(1, len(a)+1)) [less efficient]
Assessment:
- This is more of a warmup question than anything else and while it is good to know the shortcut notation, especially as it denotes some knowledge of how python deals with strings (eg substr a[0:7] for the fox) it is not necessary for most data-science’s purpose
Question 2:
Question : identity all words that are palindromes in the following sentence “Lol, this is a gag, I didn’t laugh so much in a long time”
Answer:
def isPalindrome(word: str) -\> bool:
if(word == word[::-1]):
return True
return False
def getPalindromesFromStr(inputStr: str) -\> list:
cleanStr = inputStr.replace(",","").lower()
words = set(cleanStr.split(" "))
wPalindromes = [
word for word in words
if isPalindrome(word) and word != ""
]
return wPalindromes
getPalindromesFromStr(“Lol, this is a gag, I didn’t laugh so much in a long time”)
Assessment:
- Does the candidate thinks about cleaning his/her inputs?
- Does the candidate know the basic or word processing in python such as replace / split / lower?
- Does the candidate know how to use list comprehension?
- How does the candidate structure his/her code?
FizzBuzz
FizzBuzz is a traditional programming screening question, that allows to test if a candidate can think through a problem that is not a simple if else statement. The approach that they take can also shed some light to their understanding of the language.
Question: Write a program that prints the number for 1 to 50, for number multiple of 2 print fizz instead of a number, for numbers multiple of 3 print buzz, for numbers which are multiple of both 2 and 3 fizzbuzz.
Answer:
def fizzbuzzfn(num) -\> str:
mod\_2 = (num % 2 == 0)
mod\_3 = (num % 3 == 0)
if (mod\_2 or mod\_3):
return (mod\_2 \* 'Fizz') + (mod\_3 \* 'Buzz')
return str(num)
print('\n'.join([fizzbuzzfn(x) for x in range(1,51)]))
Assessment:
- Do they know the modulo operator and are able to apply it?
- Are they storing the result of the modulo operators in variables for re-use?
- Do they understand how True/False interact with a String?
- Are they bombarding their code with if statements?
- Do they return a consistent type or mix both integer and string?
First Duplicate word
First finding of duplicate word allows to identity if candidates know the basic of text processing in python as well as are able to handle some basic data structure.
Question 1
Question: Given a string find the first duplicate word, example string: “this is just a wonder, wonder why do I have this in mind”
Answer:
string = "this is just a wonder, wonder why do I have this in mind"
def firstduplicate(string: str) -\> str:
import re
cleanStr = re.sub("[^a-zA-Z -]", "", string)
words = cleanStr.lower().split(" ")
seen\_words = set()
for word in words:
if word in seen\_words:
return word
else:
seen\_words.add(word)
return None
firstduplicate(string)
Assessment:
- Do I have constraint I need to work with, for instance in terms of memory?
- Cleans the string from punctuation? Replace or Regexp? If use regexp replace, should I compile the regexp expression or used it directly?
- Knows the right data-structure to check for existence.
- Does it terminate the function as soon as the match is found or?
Question 2:
Question: What if we wanted to find the first word with more than 2 duplicates in a string?
Answer:
string = "this is just a wonder, wonder why do I have this in mind. This is all that matters."
def first2timesduplicate(string: str) -\> str:
import re
cleanStr = re.sub("[^a-zA-Z -]", "", string)
words = cleanStr.lower().split(" ")
seen\_words = dict()
for word in words:
previous\_count = seen\_words.get(word, 0)
seen\_words[word] = previous\_count + 1
if previous\_count \>= 2:
return word
return None
first2timesduplicate(string)
Assessment:
- Some small modification is needed to be able to accommodate that change, the main one is arising from the use of a dictionary data-structure rather than a set. Counters are also a valid data-structure for this use case.
- There is little difficulty on modifying the previous function to cope with this change request, it is worth checking that the candidate does instantiate the specific key correctly, taking into account default values.
Quick Fire questions
Some quick fire questions can also be asked to test the general knowledge of the python language.
Question 1:
Question: Replicate the sum for any number of variables, eg sum(1,2,3,4,5..)
Answer
def sum(\*args):
val = 0
for arg in args:
val += arg
return val
Assessment:
- Quick interview question to check the knowledge of variable arguments, and how to setup one of the most basic functions.
Question 2:
Questions around the Fibonacci series is a classic of programming interviews and candidates should in general be at least familiar with them. They allow to test recursive thinking.
Question: Fibonacci sequences are defined as follow:
F\_0 = 0 ; F\_1 = 1
F\_n = F\_{-1} + F\_{-2}
Write a function that gives the sum of all fibonacci numbers from 0 to n.
Answer:
def fibonacci(n: int) -\> int:
# fib series don't exist \< 0
# might want to throw an error or a null
# for that
if n \<= 0:
return 0
if n == 1:
return 1
else:
return fibonacci(n-1) + fibonacci(n-2)
def naiveFibSum(n: int) -\> int:
return sum([fibonacci(x) for x in range(0, n+1)])
def sumFib(n: int) -\> int:
return fibonacci(n + 2) -1
Assessment:
- First, is the candidate able to think recursively?
- Is the candidate only thinking about a naive solution to the sum of fibonacci series or is s/he understanding that it can also summarized to a in a more effective way?
Wrap up
These questions are just meant to be a first screener for data-scientist and should be combined with statistical and data manipulation types of questions. They are meant to give a quick glimpse on whether a candidate has the basic minimum knowledge to go through a full interview rounds.
More advanced programming questions for Python would tend to cover the use of generators, decorators, cython or the efficient use of libraries such as pandas/numpy.
More from me on Hacking Analytics:
- SQL interview Questions For Aspiring Data Scientist — The Histogram
- Become a Pro at Pandas, Python’s data manipulation Library
- E-commerce Analysis: Data-Structures and Applications
- Setting up Airflow on Azure & connecting to MS SQL Server
- 3 simple rules to build machine learning Models that add value
Top comments (0)