DEV Community

Suraj Upadhyay
Suraj Upadhyay

Posted on

Machine Learning : The Tutorial Hell

This article was originally posted here

Did you get stuck ? Let’s break out !

Recently the study of making silicon-machines learn stuff on their own has gained a lot of traction and craze around it. And every single Computer Science student seems to be learning it. But, in reality, there are only a handful of students who actually make it past the tutorial hell and who can actually make cool projects, like AI powered games or autonomous drones on their own.

The Tutorial hell is often described as the inability of a student to produce anything useful, beyond what is taught to them. That is, the student is curious enough to pursue multiple tutorials, youtube videos and research papers but is still under-confident to develop his own Machine Learning solution and thus keeps pursuing more tutorials.

As you might have already heard :-

“Education without application is just entertainment.”

- Tim Sanders

An example of this would be a student who took multiple courses and earned lots of certifications but is still under-confident to implement a simple Neural Network on his own and who couldn’t think beyond what was taught to them. So they simply keep on going through tutorials.

The reason is quite apparent, the topic of Machine Learning draws heavily from the two most varied and versatile disciplines of statistics and Computer Science, which makes understanding the theory behind it a daunting task for the unsuspecting learner. At the same time, the principle of abstraction in Computer Science makes its implementation seem like a child’s play, doing all the sorcery with just a single line of python.

If that didn't already sound bad from a learner’s perspective, consider that you have to be really creative and artful to make use of your algorithms in the realm of real life businesses to build and improve great products, which unfortunately tutorials cannot teach.

Given all these hurdles, a beginner’s mind usually revolves around understanding the concepts very well, instead of implementing something or sometimes directly implementing and training models without even knowing the basics and fundamentals behind Machine Learning. The former approach is highly under-productive and passive and on the other hand the latter approach doesn’t seem to work beyond what your tutor has taught and where some amount of technical creativity is required.

Given the peculiarity of the subject and the craze among many young and inexperienced software developers, it has become increasingly difficult to get past the “tutorial hell” and jump right into the workforce.

One way out of this vicious loop of mindless learning and neurotic under-confidence is understanding what society really expects from a Machine Learning Engineer and what it is that you actually need to deliver after learning your art. Along with the above wisdom, you actually need to make your own Machine Learning projects and solutions from scratch.

In this article, we will understand what it means to be a Machine Learning Engineer and look at the skills we are going to need to become one. In the upcoming articles, we will learn these skills and use them to break it out of the hell of tutorials.

For now, let’s first understand what a real-life Machine Learning Engineer does on a daily basis, so as to align our expectations a bit more towards reality.

If you are an aspiring ML engineer, then you may already know that an engineer of any sort seldom works in isolation and usually has a team with him. Every team is formed with a specific objective of solving a particular problem in mind. In the case of Machine Learning the problem always tends to be an optimization of some sort or an improvement over an existing product or a value chain. Add to it the fact that the problem came to a Machine Learning team, you should probably expect that the problem was most likely not solved by any of the conventional methods. Which means you will have to continuously seek improvements over your current solution and do countless numbers of iterations.

Here’s a rough pseudo code which tries to describe the life cycle of a Machine Learning team project :

  1. Talk with the client.

    a. Understand the product, your and your products role in the value chain and the business requirements.

    b. Propose a Machine Learning solution.

    c. Receive feedback and suggestions.

    d. Repeat a - c. Until the client is satisfied.

  2. Work with the data.

    a. Gather and/or augment the data.

    b. Explore, decorate and visualize the data.

  3. Train your model and work out a prototype.

  4. Test your model.

  5. Repeat 3 - 4 until enough accuracy.

  6. Interact with the client.

    a. Show the prototype.

    b. Receive feedback and suggestions.

  7. Repeat 1 - 6 until the client is satisfied.

Of course, there are other or maybe better ways and methodologies to develop Machine Learning solutions, but you got the idea of what an ML engineer does, didn’t you ? And it’s very important to set your expectations realistically before you can escape the tutorial hell.

It doesn't matter which project methodology you are using, one thing is for sure, you will have to deal with a lot of data, and have to communicate a lot, with your team members, clients and your employer.

While there is no sure way of learning how to communicate, we can at least try and learn all the technical aspects of Machine Learning like - data gathering, data preprocessing, data visualization, model selection and model training, etc. etc.

With the popularity and craze of a subject, there comes a countless number of libraries and algorithms to do each sub-task. However, as the subject gets older, the libraries which stand against the test of time become industry standard. In the case of Machine Learning, however, we have at least a dozen such libraries which are now industry standard for different sub-tasks.

Here you have a list of all the libraries in popular usage along with the purpose they fulfill in a Machine Learning production pipeline.

Most used Data Processing and Visualization libraries:

  1. Numpy

  2. Pandas

  3. Matplotlib

  4. Seaborn

  5. MySQL and DBMS.

Most used Machine Learning Libraries :

  1. Scikit-learn

  2. Scipy

  3. Pytorch

  4. Tensorflow

  5. Keras

  6. Open-CV

* (The order in which the libraries appear aren’t meant to indicate anything)

We will learn each of these libraries and their usages in upcoming posts.

The libraries that you are looking at above, prove to be the real bottleneck in learning Machine Learning.

Also, along with the knowledge of these libraries, you also need a good grasp of the concepts behind their implementation. Without which you won’t be able to get really creative and dextrous with them.

Looking at the details above, we can safely claim that in order to escape the tutorial hell you need to be creative with what you learn. That is, learning the concepts alone aren’t fruitful, unless you know how to use the libraries.

A clever approach to escaping the tutorial hell includes not only relying on your tutorials but also considering simple real life applications of Machine Learning. And try to make full-blown machine learning solutions from scratch i.e. producing your own data and training your model on it.

For example, making a stone-paper-scissors bot, using computer vision and neural networks. You can generate your own data by taking pictures of your hand making “stone”, “paper” and “scissor” gestures, label them accordingly and train your model. That was just a single example and there are many things that you can prove your skills with.

You might have understood by now, that Machine Learning is more like a tool which helps you build better products and services. And more often than not these products and services already exist in some form or the other.

Escaping the tutorial hell is a difficult thing, but doing it with others becomes easier. I will be frequently posting brief tutorials of libraries and simple projects to keep you up to date with the concepts.

The next topic will be a tutorial on Data Processing and Visualization with Numpy, Pandas and Matplotlib.

To never miss out any tutorial or blog post by me on this topic you can subscribe to the “Technical Insights” newsletter on my substack here or you can follow me on medium here.

You can also comment or reply to this article/email, any kind of feedback, suggestions or discussions are highly appreciated and encouraged.

Thanks for reading,

Yours Sincere,

Suraj Upadhyay.

Top comments (0)