DEV Community

Shivam Rawat
Shivam Rawat

Posted on

Classification

Machine learning consists of two type of supervised tasks known as classification and regression. Today, we are going to discuss some important points about classification.
Classification is the process of predicting different classes from the provided information. Humans have this amazing ability to pick out different things from a cluster but it is much harder for a machine to perform the same task.
For example, movie reviews analysis could be considered a classification problem. For keeping things simple let's consider this problem as a binary classification problem, only predicting if the review is positive or negative. A classifier utilizes some training data to understand how given input variables relate to the class. In our case, known positive and negative reviews have been used as the training data. When the classifier is trained accurately, it can be used to predict for a new data point.

Classification Algorithms

There are a lot of classification algorithms but it is not possible to decide which one is superior to other. It depends on the problem and the given dataset.

Decision Tree

Alt Text
Decision tree builds models in the form of a tree structure. It uses the if-then rule set to distinguish between different classes. The rules are learned sequentially using the data one at a time. Important attributes are identified using the information gain concept.
Decision tree algorithm is prone to overfitting. To avoid this problem pre-pruning can be used, which stops the construction of the tree early.

KNN

Alt Text
K-Nearest Neighbors algorithm put all the data points on a n-dimensional space. When a new data point is introduced to the model, it analyzes the k closest neighbors and returns the most common class as prediction.

To go more deep into algorithms check this Link out.

Evaluating the model

There are many metrics that can be used to evaluate the model.

Accuracy

Accuracy is just ratio of correct prediction to the total number of classes.
Accuracy = (True Positive + True Negative)/Total

Precision Score

Precision is the ratio True positives to the total number of positive predictions.
Precision = (True Positives)/(True Positives + False Positives)

Recall Score

Recall is the ratio of True Positives to the sum of True positives and False negatives.
Recall = True Positives/(True Positives + False Negatives)

There are many other metrics that are used in machine learning. This Link can help you dig more into different type of metrics.

Top comments (0)