Recently Paige Bailey - Twitter, Github - Tweeted an analysis she did on the state of Machine Learning cohorts. Using her Article as a basis, I'd like to share my view on what I've doing for the past three years as a Machine Learning Engineer and where I believe I find myself in.
There's a lot of contradiction of what a Machine Learning Engineer is today and most job openings just assume what it is based on the company's needs. Researches like Paige's help us understand and begin to draw lines of what we should be doing or even know what our job title should be.
I constantly receive job offers asking if I'm interested in positions such as Data Engineer (ML) or Software Developer - Data, which to my understanding, does not fit my profile.
With this in mind here's my take on what I, as a Machine Learning Engineer, am doing:
- Basic Exploratory Data Analysis (EDA): It is impossible to do any Machine Learning work without knowing your data first. I say BASIC here because I'm looking to find the underlying structure of the data, not doing a fancy report or discovery. What I usually do here is to find which features I'm going to use, which are irrelevant and which machine learning methods are the most appropriate for the problem at hand. What I DON'T do: Fancy reports, customer dashboards and presentations. That would be a Data Scientist's or Data Analyst's job.
- Traditional Machine Learning: As I stated before, it is quite hard to do ML without EDA, so after finding out the problem I'm dealing with, I'll generally mess around with some ML techniques. That could be Sklearn, XGBoost, or even Tensorflow/Pytorch. Again, I'm not looking for state of the art performance, nor big fancy models as you see on NeurIPS article's or OpenAI releases. I'm looking to solve a small very specific problem inside my domain. Nothing fancy. Considering this, it would put me in somewhere along the lines of the Data Scientist – Business, Traditional ML.
- Product Integration: Here is where I think the Engineering part of ML Engineering comes in. I already have built a model that solves a very specific domain problem, now what can I do to make it available for everyone? Here I'll usually look for the best approach to put the model in production, create a data pipeline and if needed updating and training pipelines for future use. Generally this is done with Docker containers and some frameworks like MLFlow and a hosting platform like AWS or GCP. This will fill into the ML Ops practitioner bucket
As the team evolves and more people come in, I can see the ML Engineer focusing more and more on 2 and 3. Sometimes you can receive a ML Model built on Sklearn with several preprocessing steps, which will yield a model with hundreds of MB if not GB and poor performance. Those can be optimized and I already have found myself doing these kind of optimizations.
As a finishing thought, according to Paige's cohort studies I'd be a mix of MLOps practitioner and Data Scientist - Business, Traditional ML. These are the roles I'd fit in the most and probably the mix that - for me - would describe what a ML Engineer does.
What do you think? Am I missing something?
What is a Machine Learning engineer to you?