We consume news through several mediums throughout the day in our daily routine, but sometimes it becomes difficult to decide which one is fake and which one is authentic not every piece of news that we consume is not real.
A sort of sensationalist reporting, counterfeit news embodies bits of information that might be lies and is, for the most part, spread through web-based media and other online media.
This is regularly done to further or force certain kinds of thoughts or for false promotion of products and is frequently accomplished with political plans.
These are the number of times a word is present in a document. Large values mean that a word is present so many times with respect to other words.
IDF(Inverse Document Frequency)
IDF is a proportion of how critical a term is in the whole corpus of data.
To detect fake news and real news. Using sklearn, we build a TfidfVectorizer on our dataset. Then, we initialize a PassiveAggressive Classifier and fit the model. In the end, the accuracy score and the confusion matrix tell us how well our model fares.
Passive Aggressive algorithms are online learning algorithms. Such an algorithm remains passive for a correct classification outcome, and turns aggressive in the event of a miscalculation, updating and adjusting. Unlike most other algorithms, it does not converge.
The data set used can be found [(url)]
This dataset has four columns,
import numpy as np import pandas as pd import itertools from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.metrics import accuracy_score, confusion_matrix
The output is in form of a vector;
Link to the project code