DEV Community

Dang Hoang Nhu Nguyen
Dang Hoang Nhu Nguyen

Posted on

[BTY] Day 3: Improve software engineering skills as a researcher

All the information are from this article of Lj Miranda: https://ljvmiranda921.github.io/notebook/2020/11/15/data-science-swe/. Please read the original for more details and related resources about the following notes.

Aside from developing Deep Learning models, you have to know how to create a machine learning application that receives HTTP requests, then deploy it as a containerized app. This task, aka. building Machine Learning (ML) Service, relates to software engineers that we (assume you're researchers like me) are lacking in skills.

Confused things when learning software engineering skills

Why?

Improves engineering sensibilities.

Most applications treat ML models as software components.
ML models in software components

Increases familiarity with the ML workflow.

We’re familiar with the ML experimentation workflow. In addition, there is also a productization workflow where we deploy our models, perform A/B testing, take care of concept drift, and more.

ML Workflow

Another tool under your belt to create more cool stuff.

Even if you won’t be working as a full-fledged ML Engineer or Developer, the technologies you’ll learn while building an ML Service enables you to do more things!

How?

1. Be comfortable with UNIX commands and a version-control system like Git.

git

2. Structure your Python project in a modular fashion

 my_project/
  ├── api
  ├── docs
  ├── experiments
  ├── README.md
  ├── requirements.txt
  ├── src
  │   ├── entrypoint.py
  │   └── my_module
  │       └── module_file.py
  └── tests
      ├── my_module
      │   └── test_my_module.py
      └── test_entrypoint.py
Enter fullscreen mode Exit fullscreen mode

3. Learn how to write an API on top of your model using Flask or FastAPI

4. “Containerize” your application using Docker

You want to use Docker for two things: (1) reproducibility and (2) isolation.

5. Learn how to deploy to a Cloud Platform

you got it

What’s next?

From here on in, you can keep improving your app by:

  • Minimizing the size of your Docker image using multi-stage builds.
  • Cleaning-up your repository. Model files shouldn’t be committed but stored in a storage service (e.g. Google Cloud Storage or AWS S3)
  • Adding a Continuous Integration / Continuous Deployment (CI/CD) pipeline so that any change on Github is automatically reflected on your deployed app. I often use Github Actions for this (e.g., any change in the master branch is deployed automatically).
  • Improving security! Make use of Docker args or .env to secure API tokens, passwords, and whatnot. Ideally you shouldn’t be committing any secrets on Git (it can still be recovered if you deleted it!). Be careful!

Top comments (0)