Phylis Jepchumba, MSc

Posted on Aug 6, 2021 • Edited on Aug 16, 2021

Introduction to Machine Learning with Python.

#datascience #machinelearning #python

What is Machine Learning (ML).

Machine learning is a type of Artificial Intelligence that extract patterns out of raw data by using an algorithm or method.

The main focus of ML is to allow computer systems learn from experience without explicitly programmed or human intervention.

Need for Machine Learning.

Human beings at this moment, are the most intelligent and advanced species on earth because they can think, evaluate and solve complex problems. On the other side, Artificial intelligence is in its initial stage and haven't surpassed human intelligence.
Due to growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data storage, Machine Learning is essential for;
*Producing models that can analyze bigger, more complex data and deliver faster and more accurate results.
*Building precise models that ensures an organization has a better chance of identifying profitable opportunities or avoiding unknown risks.

Why and When to Make Machines Learn

There are several circumstances where we need machines to take data-driven decisions with efficiency and at a huge scale such as;

Lack of human expertise.
Scenarios where there is lack of human expertise such as navigation in unknown territories or spatial planets need machine learning.

Dynamic Scenarios
Scenarios that keep changing over time need a machine to learn and take various data driven decisions.

Difficulty in translating expertise into computational task
There can be various domains in which humans have their expertise but they can't translate expertise into computational tasks such speech recognition and cognitive tasks.

Challenges in Machine Learning

While Machine learning is rapidly evolving, it still has a long way to go. The reason behind this is because ML has not been able to overcome challenges such as;

Time-consuming task- Data acquisition, feature selection and retrieval consume a lot of time.

Lack of specialist Persons- As ML is still evolving, availability of experts is a tough job.

Issues of Overfitting and Underfitting- If the model is overfitting or underfitting, it cannot be represented well for the problem.

Difficulty in deployment- Complexity of ML projects makes it difficult to be deployed in real life.

Quality of data- Having good quality data for ML algorithms is a challenge. Use of low quality data leads to problems related to data preprocessing and feature extraction.

Applications of Machine Learning.

Machine learning is the most rapidly growing technology used to solve real-world complex problems which cannot be solved by traditional approach such as:

Emotion analysis
Stock market analysis and forecasting
Speech synthesis.
Customer segmentation.
Fraud detection.
Weather Forecasting and Prediction.

Why Python for Machine Learning?

Extensive set of packages.
Python has an extensive and powerful set of packages ready to be used in various domains such as numpy,scipy,pandas and scikit learn.
Easy prototyping.
Python provides easy and fast prototyping useful for developing new algorithms.
Python has libraries for data loading, visualization, statistics, natural language processing and image processing which provides data scientists with a large array of general- and special-purpose functionality.

Installation

For us to work with machine learning projects we will use Pre-packaged python distribution: Anaconda.

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.
The distribution includes data-science packages suitable for Windows, Linux, and macOS.
Anaconda comes with NumPy, SciPy, matplotlib, pandas, IPython, Jupyter Notebook, and scikit-learn.

To set up Python environment using Anaconda use the following steps:

Download the required installation package from Anaconda Distribution Using this Link.
You can choose for windows ,Mac and Linux as per your requirement.
Next, select the python version you want to install on your machine. The latest python version is 3.9. There you will get options for 64-bit and 32-bit installer for both.
After selecting the OS and python version, it will download the Anaconda installer on your computer. Double click the file and the installer will install Anaconda package.

Components of Python ML Ecosystem.

The core libraries that form the components of python machine learning ecosystem are;

Jupyter Notebook
It is an interactive environment for running code in the browser. It is a great tool for exploratory data analysis and is widely used by data scientists and also makes it easy to incorporate code, text, and images.

NumPy
It is the fundamental package for scientific computing with Python which contains functionality for multidimensional arrays, high-level mathematical functions such as linear algebra operations and the Fourier transform, and pseudorandom number generators.

Matplotlib
It is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Pandas
It is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Knowing Your Task and Knowing Your Data

The most important part in the machine learning process is understanding the data you are working with and how it relates to the task you want to solve.
It will not be effective to randomly choose an algorithm and throw your data at it.
It is necessary to understand what is going on in your dataset before you begin building a model since each algorithm is different in terms of what kind of data and what problem setting it works best for.

Machine Learning Approaches.

Once you have a clear understanding of your data, you can choose the best algorithm so solve your problem based on the following approaches.

1.Supervised Learning
In supervised Learning the user provides the algorithm with pairs of inputs and desired outputs, and the algorithm finds a way to produce the desired output given an input.

The most common forms of supervised learning are Classification and Regression.

Classification is used to group similar data points into different sections.
Classification algorithms include;

Logistic regression
Support vector machines
Convolutional deep neural networks
Naive Bayes

Regression outputs a number rather than a class and is useful when predicting problems like stock prices, probability of an event and even temperature for a given day.
Regression Algorithms include;

Linear regression
Random forest
Multi-layer perceptron
Convolutional deep neural network

Examples of Supervised Learning tasks are;

Predicting house prices.
Here the inputs can be square footage, number of rooms, features, whether a house has a garden or not.
-By leveraging data coming from thousands of houses, their features and prices, we can now train a supervised machine learning model to predict a new house’s price based on the examples observed by the model.
Detecting fraudulent activity in credit card transactions.
Here the input is a record of the credit card transaction, and the output is whether it is likely to be fraudulent or not.
Other examples are weather prediction, stock prediction and so on

2.Unsupervised Learning.

In unsupervised learning, only the input data is known, and no known output data is given to the algorithm.

An example of unsupervised learning in real life would be sorting different color coins into separate piles. By looking at their features such as color you can see which coins are associated and cluster them into their correct groups.

Unsupervised learning is commonly used for Clustering and Anomaly detection.

Clustering is the act of creating groups with different characteristics. It attempts to find various subgroups within a dataset.
In clustering association learning uncovers the rules that describe your data.

Anomaly detection is the identification of rare or unusual items that differ from majority of data.

Examples of unsupervised learning tasks include:

Segmenting customers into groups with similar preferences

-Given a set of customer records, you might want to identify which customers are similar, and whether there are groups of customers with similar preferences. For a shopping site, these might be "parents", "bookworms", or "gamers". Because you don’t know in advance what these groups might be, or even how many there are, you have no known outputs.

Detecting abnormal access patterns to a website

-To identify abuse or bugs, it is often helpful to find access patterns that are different from the norm. Each abnormal pattern might be very different, and you might not have any recorded instances of abnormal behavior. Because in this example you only observe traffic, and you don’t know what constitutes normal and abnormal behavior, this is an unsupervised problem.

3.Semi-supervised Learning.

It is a mix between supervised and unsupervised approaches.
It takes the middle road by being able to mix together a small amount of labelled data with a much larger unlabeled dataset.

Popular semi-supervised learning algorithms include:

PU classification
Transductive SVM
Co-training

4.Reinforcement Learning.

It is less common and much more complex compared to other approaches. It does not use labels and instead uses rewards to learn.

Three major components that make up reinforcement learning are : the agent, the environment, and the actions. The agent is the learner or decision-maker, the environment includes everything that the agent interacts with, and the actions are what the agent does.

Reinforcement learning occurs when the agent chooses actions that maximize the expected reward over a given time.

Popular reinforcement learning algorithms include:

Q-learning
Temporal difference
Monte Carlo tree search
Sarsa

DEV Community