DEV Community

Cover image for Your Guide 2024 to Machine Learning. Finally Get It
Aristek Systems
Aristek Systems

Posted on

Your Guide 2024 to Machine Learning. Finally Get It

Machine learning used to be for geeks and sci-fi fans. Now, ChatGPT alone has 180 million users. AI and machine learning tools spread fast, so it’s time to cover the basics.

This article is a crash course into machine learning for non-tech people. You can finally understand what ML is; how these models work; and how your company can benefit from it?

What Is Machine Learning?

Machine learning (ML) is a branch of artificial intelligence (AI) that relies on adaptive learning. All machine learning is AI, but not all AI is ML.

Image description

Machine learning behavior is not explicitly programmed, the algorithms learn autonomously. That’s why such models can get better over time and stay flexible.

By contrast, non-ML AI algorithms are rigid and rule based. When a computer can’t execute a line of code, it freezes not knowing what to do. If operating systems could handle unconventional situations with ML, we wouldn’t have “blue screens of death”.

Most modern AI systems apply at least some elements of machine learning development. That’s why the terms are often used interchangeably. But you can still find non ML AI models in fields like robotics, speech, or image recognition.

If you want to probe deeper and learn more about the connection of artificial intelligence, machine learning and deep learning, read this short guide first.

Types of Machine Learning Models & Algorithms

There are plenty of different algorithms, but we’ll focus on the high level types.

Image description

Supervised learning models learn by example. To train the model, developers feed it with extensive input and output data. As the AI identifies patterns, it can predict outcomes of new data.

There are two main types of supervised models:

Regression models find connections between variables. You can visualize such connections in a chart with X and Y axes, as in the picture above. If a coffee shop sells a cappuccino for $5, you can calculate the price of 10 cups. That’s called linear regression;
Classification models sort items into categories. Imagine a full fruit bowl. If I ask you to pick all the bananas out, you’ll instinctively look at the shape, color, and so on. To people, classification is simple and automatic. We don’t need to have seen a billion bananas to tell them apart. But computers need a ton of input, and that’s why we still have captchas (although bots have become better at it).

Supervised models are great when you have lots of labeled data. They are perfect for content recommendation systems, automated grading, and language learning tools.

Unsupervised learning doesn’t have human guidance. Developers feed AI with large datasets and let it find patterns on its own.

Clustering is a popular type of unsupervised learning. It’s similar to classification, except the data is unlabeled.

Let’s say I ask you to separate some gemstones by type. Even if you have never seen a gemstone, I bet you can tell rubies and emeralds apart. All because they look completely different: one’s reddish, the other is greenish.

Use unsupervised learning when you need to identify trends, organize content, or cluster students.

Reinforcement learning models are trained with feedback.

The algorithm interacts with the environment, and receives feedback – either rewards or penalties. The AI learns to maximize actions for long-term rewards.

Back to the gemstone example. Imagine that you need to pick all rubies out of 10 stones, but you don’t know what it looks like. If you pick the right gem, you can keep it. If you make a wrong choice, you’ll get hit with a stick. That’s reinforcement learning.

Use reinforcement learning models when you have a dynamic environment. With such models, you can simulate physics experiments, create adaptive learning paths, or power your interactive learning games.

How to Build a ML Model (with a Fruit Example)

Let’s build a model that would tell apart apples and bananas. First, we explain the apple-banana model development, then we’ll add some more theory.

Image description

Step 0. Understand Your Problem & Goals

We’ll start our fruit picking ML model at the next stage. Here’s why someone would develop one: to teach you the fundamentals of ML.

Define your problem with numbers. If you know where your starting point is, you can track the progress.

Let’s say, you have a grocery store and your milk often expires on the shelf. Here’s what you can do:

  1. Track current situation. How much milk do you stock and sell? What kind of milk do you have and what is its shelf life? Where do you put milk? And so on.
  2. Compare your results with competitors. Do they have the same problem, or have they figured out a solution?
  3. Convert business goals to specific KPI. Milk expiration is caused by multiple reasons. Once we understand the cause of the problem, we can come up with solutions. We could strive to improve milk sales by 15%, or improve inventory tracking, or suggest stocking sterilized milk instead of pasteurized.

Step 1. Collect Data

Apple-banana model. The fruits are different in many ways: in shape, color, texture, weight, and so on. But for our simple model, only 2 parameters should be enough: color and shape.

To measure color we’ll use a spectrometer; for measuring the shape we’ll get a ruler. This is what it’ll look like:

Image description

Below are the most common sources to start with. Note, that many models combine data from different sources.

  • Open-source datasets. You may be surprised at the quality of open source data. Major tech companies and even governments provide free data for your models. Here’s a list of some public data sources.

  • Web scraping. If you want to collect data from websites, you can do it by hand. Or get a web scraper and become much faster. Popular scrapers are Scrapy and ProWebScraper.

  • In-house data. With custom data you can build the most specific models. Sometimes it’s the only way to achieve good results, because public data has its limits.

Step 2. Prepare & Preprocess the Data

The apple-banana example. Once we have the data, it’s time to prepare it:

  • Put the data together.
  • Randomize the order. We don’t want the model to look at the previous fruit. It should locate bananas independent of the fruit order.
  • Visualize the data. This way, we’ll see the datapoint balance. If the training data has way more bananas, the model will also label almost everything as bananas in the real world.
  • Split the data into training and evaluation sets. We need to keep a separate evaluation set to make sure it works with new input, too. Not just with the training set.
  • There are way more data preparation types out there. Let’s give deeper.

This is the longest and most expensive stage. Data preparation can take up to 80% of the machine learning model development. Here’s why.

In machine learning, there’s a saying: garbage in, garbage out. Your model is only as good as the data it’s trained on.

There’s a famous example with Amazon’s recruiting AI. The model consistently rated women’s applications much lower than men’s. Why? Was the ML model sexist? No. The model was trained on resumes submitted in the past 10 years. Most applicants were men, that’s why the model preferred them. In the end, Amazon had to scrap the hiring tool.

Now that we know how important data preprocessing is, let’s figure out what it’s all about. Here are the steps in data preprocessing:

Image description

Data cleaning. Machine learning engineers correct errors and fill in missing data. They handle the missing values, and address outliers or any anomalies. Most popular data cleaning techniques are imputation, removal, and transformation.

Data cleaning solves 2 issues: missing or noisy data.

For missing data, often you’ll fill in the values. But if you have a very good dataset, it’s okay to ignore the tuples.

If your data is noisy, you’ll need to erase meaningless garbage. We won’t go into too much detail, but noisy data is handled in 3 ways: binning, regression, and clustering.

Data transformation means converting raw data into a structured and usable format. These are the main ways to transform data:

  • Normalization. Think of normalization like adjusting the volume on your music player. It’s a way to make sure all your data values play nicely together by putting them in a specific range, like turning the volume knob between 0 and 100.
  • Attribute Selection. Imagine you have a bunch of tools, and you want to pick only the most useful ones for a specific job. Attribute selection is like choosing the best tools from your toolbox and creating new, more focused tools to make your work easier.
  • Discretization. If you have a bunch of numbers and you want to simplify things, discretization is like rounding those numbers to make them easier to handle.
  • Concept Hierarchy Generation. Can sound scary, but it simply means converting details into broader categories. Like banana is also a fruit.

Data reduction means reducing the dataset while keeping the important information. Think of reading the book summary instead of the whole book.

  • Feature Selection. This involves choosing a subset of relevant features from the dataset, removing unnecessary or duplicate ones. Techniques like correlation analysis, mutual information, and principal component analysis (PCA) are commonly used.
  • Feature Extraction. Transforming data into a lower-dimensional space while retaining important information. This is beneficial for high-dimensional and complex datasets. Techniques include PCA, linear discriminant analysis (LDA), and non-negative matrix factorization (NMF).
  • Sampling. Selecting a subset of data points to reduce the dataset size while preserving essential information. Techniques such as random sampling, stratified sampling, and systematic sampling are employed.
  • Clustering. Grouping similar data points together into clusters, replacing them with a representative centroid. Methods include k-means, hierarchical clustering, and density-based clustering.
  • Compression. Reducing the dataset size for storage and transmission without losing vital information. Techniques like wavelet compression, JPEG compression, and gzip compression are commonly used.

Step 3. Train the Model

We’ve already discussed major types of machine learning models above. And training each model takes a bit different approach.

For our apple-banana AI, we can use a simple linear model. Here’s how the model would train itself:

  1. Start by drawing a random line that would divide our dataset.
  2. Check if the random guess was any good. Did the model guess which fruits were bananas, and which ones were apples?
  3. Update the values to improve the regression line.
  4. Iterate. Check the results, and update the values again.

Image description

Step 4: Evaluation

Once the model is trained, we should test it against data that has never been used for training. For that, we kept the evaluation set.

With evaluation, we’ll see if the model can work with the real world data.

Some of the things we look at during evaluation are accuracy, precision, recall, and F1 score.

Step 5: Parameter Tuning

Once the model shows good results, you can further improve it with tuning.

When we started preparing data for our model, we implicitly assumed some values:

  • How many times do we run through the training set during training?
  • How much does the line change after each iteration?

Now is the time to test these assumptions and try new values.

Data scientists can tweak these parameters forever, so it’s important to know when to stop. You decide when the model is accurate enough.

Step 6: Deploy

Now that the model is accurate, we can make it work with future outcomes. Just need to feed the model with real-world data.

Finally, our model can predict which of the fruits is apple. But machine learning models can do much more.

Instead of locating apples, another model would locate tumors. Let’s talk about other examples.

Machine Learning Common Business Usage

Every internet-user faces machine learning one way or another.

The most common examples are product or content recommendations.

Next time you search for a video on YouTube, or simply explore Google search engine, you know that ML algorithms are helping you choose.

Another example is virtual assistants such as Apple’s Siri, Amazon’s Alexa or Google assistant.

They all rely on automatic speech recognition (ASR), computer speech recognition, or speech-to-text, to translate human speech into a written format.

You can also see various chatbots on websites that help you navigate better and answer the most frequently asked questions.

Moreover, you’ve probably used Open AI’s GPT-3. It is a neural network trained on a whole host of English language articles available on the internet and is capable of producing answers in response to text prompts.

There are more industry-specific uses of ML that help businesses perform better.

Image description

Healthcare

ML helps healthcare organizations improve diagnostics, treatment, and patient experience through visual assistants, medical image analysis, and virtual nursing.

Medicine production companies use machine learning for drug discovery, in clinical trials and manufacturing.

Veterinary

It helps a lot by analyzing large data sets, making accurate predictions, and developing personalized treatment plans, resulting in improved outcomes and better overall pet care.

eLearning

ML can revolutionize education by personalizing learning experiences, improving student outcomes, and optimizing educational resources.

With machine learning, educators can analyze large amounts of data to identify patterns and trends. It allows them to tailor instruction to individual students’ needs and provide targeted interventions for struggling learners.

Additionally, machine learning can help automate administrative tasks such as grading and scheduling.

eCommerce & Retail

ML can be used to predict future sales to allow business to plan revenue, supplies, and many others.

It’s hard to deal with a huge amount of data, memory and computational time restrictions. Yet, ML is more than capable of tackling these issues.

Retailers also use computer vision for inventory management, and personalization.

In addition to recommendation systems to personalize advertising, pricing, and services, ML is used for fraud detection, and real-time consumer engagements.

Finance

Machine learning is used by financial services companies for risk assessment, algorithmic trading, customer service, and personalized banking.

What’s more, ML uses include credit card defense and anomaly detection for fraud protection.

Insurance

Recommendation engines are able to provide choices to customers based on both their needs and the experiences of other users with certain insurance products.

Processing claims and underwriting both benefit from machine learning.

Logistics & Supply Chain Management

Machine learning systems assist logistics firms in improving traffic management, warehouse and route optimization, passenger safety, and fleet productivity.

ML solutions can also include traffic monitoring, driver support, precise delay forecasts, and predictive maintenance.

Benefits & Challenges of ML

ML has a lot to offer in any industry and for any company.

What’s more, ML makes it possible to extract adverse information from articles much faster and more effectively.

Benefits

Here are some common benefits for companies implementing AI and machine learning in practice:

  • Enhanced employee productivity thanks to automation of task assignments or tedious tasks.
  • Time-saving. For example, ML Document search allows searching for answers in thousands of texts related to the user’s questions.
  • Cost-efficiency. ML allows you to save money on equipment maintenance by providing predictive monitoring and preventive maintenance.
  • Improved customer experience. For instance, virtual assistants and chatbots can deal with customer requests more quickly. What’s more, ML can send personalized offers using customer behavior data analytics.
  • Better production efficiency.Machine learning can optimize the process using demand forecasting and predictive modeling.
  • A modern solution for old and new challenges. When old-school development is too tedious and expensive, machine learning will help, as it goes beyond limitations of traditional programming.

Challenges

Despite the number of benefits, machine learning isn’t perfect. Let’s talk about challenges.

ML bias. If algorithms are trained on data sets that exclude certain populations or contain mistakes, it can produce world models that are inaccurate, or even discriminatory.

As with Amazon’s AI hiring tool, the bias may be unintended, but due to a poor data preprocessing.

Privacy. The complexity of machine learning algorithms is one of the primary issues facing the AI sector.

Large volumes of data are needed for algorithms to be trained and improved, which presents a privacy and security risk.

Furthermore, consumers and regulatory agencies are becoming increasingly concerned about how to ensure that personal information is safe and utilized in an ethical manner.

Alignment issues. Some experts fear that AI will get out of control and destroy all humanity. The worst part, we won’t even see it coming until it’s too late.

For instance, there’s a petition signed by many famous people to slow down the development of AI.

Machine Learning Trends

According to Grand View Research (GVR), the global market size for artificial intelligence is projected to expand from $136.6 billion in 2022 to a whopping $1.8 trillion in 2030.

Here’s what you can expect in the next few years out of machine learning.

Multimodal ML. Most current AI are good at doing one job: playing chess, writing text, or optimizing inventory. But this is changing. OpenAI’s GPT-4 leads the way by processing text, images, and sound simultaneously, mimicking human sensory abilities. Other models will follow soon.

Agentic AI. Current models need human input to react. But there’s a change coming from reactive to proactive models. These advanced AI agents act autonomously, setting goals without direct human intervention. Imagine environmental monitoring systems detecting early signs of a forest fire or financial agents dynamically managing investment portfolios in real time.

Open Source ML. The democratization of AI continues through the rise of open source models. GitHub’s data showcases a surge in developer engagement with generative AI projects. Open source AI not only reduces costs but also encourages transparency. Yet concerns about misuse still persist.

Retrieval-augmented generation. We can’t fully trust AI yet, the models often hallucinate. ChatGPT can make up a completely false answer and present it with full confidence.

But Retrieval-Augmented Generation (RAG) can help solve it. RAG improves accuracy by blending text generation with facts from external sources.

Customized enterprise generative models. While massive tools like Midjourney and ChatGPT dominate consumer attention, business applications lean towards smaller, customized models. These models, tailored for specific niches like healthcare and finance, offer improved privacy, security, and efficiency.

Need for machine learning talent. The need for AI and machine learning talent continues to surge. Here are the most wanted positions: MLOps, ML programming, data science, data analysis, and operations.

Shadow AI, or unauthorized use of AI within organizations. As machine learning becomes more accessible, shadow AI gets more threatening. Mainly, experts are concerned with data privacy breaches. Companies are going to introduce AI policies if they want to protect themselves and their customers..

A reality check for generative AI. When ChatGPT and Midjourney became mainstream, there was a lot of hype. People were excited and afraid. But we’re starting to understand the limitations of current models. Generative AI has issues with output quality, security, ethics, and even integration difficulties. Let’s stay optimistic, but realize that machine learning still has a long way to go.

ML ethics and security risks. Machine learning raises lots of ethical concerns. Some are down to earth, like deep fakes, ransomware, phishing attacks. Others warn of AI takeover. We’ll only have more of such worries in the coming years.

Evolving regulations. 2024 is pivotal for AI and ML regulation globally. The EU’s AI Act sets a precedent potentially influencing global standards – just like it did with GDPR.

Meanwhile, in the U.S. there’s no federal legislation so far. But 7 major big tech companies voluntarily committed to manage AI risks. These are: Amazon, Anthropic, Google, Inflection, Meta, Microsoft, and OpenAI.

How to Implement Machine Learning?

Machine learning is still an emerging technology. Right now, companies have a window of opportunity to implement machine learning before the competitors.

What’s more, it’s often cheaper than you’d think. Sure, training a deep learning algorithm from scratch takes a lot of work. But even today there are packaged AI solutions that require much less development time. With them, you can skip directly to stage 5 of the ML development

Want to introduce ML to automate your organization? Reach out for a free machine learning consultation. We’ll take a look at your company and suggest a roadmap for introducing machine learning to match your specific needs.

Top comments (1)

Collapse
 
sreno77 profile image
Scott Reno

Very interesting! Do you have any actual examples/tutorials of how to do the whole process with TensorFlow or some other tool?