DEV Community

Cover image for A Review Of Several Python Supervised Learning Classifiers With Some Of Their Advantages And Drawbacks
DavidNNussbaum
DavidNNussbaum

Posted on

A Review Of Several Python Supervised Learning Classifiers With Some Of Their Advantages And Drawbacks

In the realm of Python supervised learning there are many classifiers that can be accessed utilizing the scikit-learn library. These are algorithms that are given training data which consists of a large portion of the data on hand. A certain percentage, many times around 20%, is saved as the test set. The information that we want the model to determine is designated as the target. The model uses the features associated with each unit of the training set to determine the relationships between the features and the target. The model is then given the data from the test set which has the target removed and a determination is made as to how accurate the model is in predicting the target in the test set.

We will review various classifiers with their pros and cons.

K-Nearest Neighbors – The model studies which pieces of data are closest to each other. It is a simple approach but slows down as the data set becomes larger.

Logistic Regression – This model is good for determining an outcome. There are three types:
1) Binary Logistic Regression – The model predicts whether something is or is not true – if an email is spam or not or if a tumor is malignant or not.
2) Multinomial Logistic Regression – There are three or more categories without ordering. An example would be to determine which is the favorite activity, horseback riding, bowling, or swimming.
3) Ordinal Logistic Regression – There are three or more items with ordering for example rating of baseball teams from 1-10.

There is a decision boundary and the determination is made based on where the item appears relative to the decision boundary.

A drawback would be that it is not used to classify items, for example to determine if a picture is of a horse, a cow, or a chicken.

Decision Tree – In this algorithm the goal is to create a model that predicts the value of a target variable in the test data by learning simple decision rules inferred from the training data features. The logic is transparent and can therefore be followed.

Two of the downsides are that a person can create over-complex trees that do not generalize the data well. In addition, one can create biased trees if some classes dominate.

Random Forest – Here many decision trees are used, each working on a subset of the data. Their results are averaged in an attempt to obtain a more accurate result.

Drawbacks include that the actual functioning is not transparent and for a large data set there is tremendous usage of memory.

Support Vector Machine – This model segregates the given dataset in the best possible way. The distance between the nearest points is known as the margin which is better if it is smaller. The objective is to select a hyperplane with the maximum possible margin between support vectors in the given dataset. Support vectors are the data points, which are closest to the hyperplane and will define the separating line better by calculating margins. The hyperplane is a decision plane which separates between a set of objects having different class memberships.

A drawback would be that it is not suitable for large datasets.

Voting – This model basically accumulates the results of multiple other models and makes a prediction based on one of two types:
1) Hard voting – The votes of the various models are summed and the class with the most votes is predicted.
2) Soft voting – The predicted probabilities from the various models are summed and the prediction is made based on the class label with the largest sum probability.

A drawback is that multiple models have to be run.

Naïve Bayes – This algorithm goes on the assumption that the various features are not dependent one on another. It is used for classification and goes by probability. The class with the highest probability is considered to be the class to which the data point belongs. It is considered a fast algorithm.

A drawback is that this algorithm presumes that all features are independent of each other which they rarely are, however despite this for binary classification there is a relatively high successful prediction rate.

Thank you for reading!

Top comments (0)