Dang Hoang Nhu Nguyen

Posted on Feb 6, 2022

[BTY] Day 3: Improve software engineering skills as a researcher

#betterthanyesterday #roadmap

All the information are from this article of Lj Miranda: https://ljvmiranda921.github.io/notebook/2020/11/15/data-science-swe/. Please read the original for more details and related resources about the following notes.

Aside from developing Deep Learning models, you have to know how to create a machine learning application that receives HTTP requests, then deploy it as a containerized app. This task, aka. building Machine Learning (ML) Service, relates to software engineers that we (assume you're researchers like me) are lacking in skills.

Why?

Improves engineering sensibilities.

Most applications treat ML models as software components.

Increases familiarity with the ML workflow.

We’re familiar with the ML experimentation workflow. In addition, there is also a productization workflow where we deploy our models, perform A/B testing, take care of concept drift, and more.

Another tool under your belt to create more cool stuff.

Even if you won’t be working as a full-fledged ML Engineer or Developer, the technologies you’ll learn while building an ML Service enables you to do more things!

How?

1. Be comfortable with UNIX commands and a version-control system like Git.

2. Structure your Python project in a modular fashion

 my_project/
  ├── api
  ├── docs
  ├── experiments
  ├── README.md
  ├── requirements.txt
  ├── src
  │   ├── entrypoint.py
  │   └── my_module
  │       └── module_file.py
  └── tests
      ├── my_module
      │   └── test_my_module.py
      └── test_entrypoint.py

3. Learn how to write an API on top of your model using Flask or FastAPI

4. “Containerize” your application using Docker

You want to use Docker for two things: (1) reproducibility and (2) isolation.

5. Learn how to deploy to a Cloud Platform

What’s next?

From here on in, you can keep improving your app by:

Minimizing the size of your Docker image using multi-stage builds.
Cleaning-up your repository. Model files shouldn’t be committed but stored in a storage service (e.g. Google Cloud Storage or AWS S3)
Adding a Continuous Integration / Continuous Deployment (CI/CD) pipeline so that any change on Github is automatically reflected on your deployed app. I often use Github Actions for this (e.g., any change in the master branch is deployed automatically).
Improving security! Make use of Docker args or .env to secure API tokens, passwords, and whatnot. Ideally you shouldn’t be committing any secrets on Git (it can still be recovered if you deleted it!). Be careful!

DEV Community

[BTY] Day 3: Improve software engineering skills as a researcher

Why?

Improves engineering sensibilities.

Increases familiarity with the ML workflow.

Another tool under your belt to create more cool stuff.

How?

1. Be comfortable with UNIX commands and a version-control system like Git.

2. Structure your Python project in a modular fashion

3. Learn how to write an API on top of your model using Flask or FastAPI

4. “Containerize” your application using Docker

5. Learn how to deploy to a Cloud Platform

What’s next?

Top comments (0)

Read next

HTTP Status Codes Explained

Automating JIRA Ticket Creation with a Flask API: A GitHub Webhook Integration Guide

AWS workshop #2: Leveraging Amazon Bedrock to enhance customer service with AI-powered Automated Email Response

Servlet: The Foundation of Java Web Technology