DEV Community

elsie-n
elsie-n

Posted on

Detecting Fake News with Python and Machine Learning

We consume news through several mediums throughout the day in our daily routine, but sometimes it becomes difficult to decide which one is fake and which one is authentic not every piece of news that we consume is not real.

Terminologies

-Fake News
A sort of sensationalist reporting, counterfeit news embodies bits of information that might be lies and is, for the most part, spread through web-based media and other online media.
This is regularly done to further or force certain kinds of thoughts or for false promotion of products and is frequently accomplished with political plans.

-Tfidf Vectorizer
TF(Term Frequency)
These are the number of times a word is present in a document. Large values mean that a word is present so many times with respect to other words.

IDF(Inverse Document Frequency)
IDF is a proportion of how critical a term is in the whole corpus of data.

Project

To detect fake news and real news. Using sklearn, we build a TfidfVectorizer on our dataset. Then, we initialize a PassiveAggressive Classifier and fit the model. In the end, the accuracy score and the confusion matrix tell us how well our model fares.

PassiveAggressive
Passive Aggressive algorithms are online learning algorithms. Such an algorithm remains passive for a correct classification outcome, and turns aggressive in the event of a miscalculation, updating and adjusting. Unlike most other algorithms, it does not converge.

Data Analysis

The data set used can be found [(url)]

This dataset has four columns,

  • unnamed
  • title
  • text
  • label

Libraries

import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

Enter fullscreen mode Exit fullscreen mode

The output is in form of a vector;
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
Fake=True Real=False

Link to the project code
[(https://github.com/elsie-n/FakeNews/tree/main)]

Top comments (0)