## DEV Community π©βπ»π¨βπ» is a community of 963,673 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Japneet Singh Chawla

Posted on • Originally published at techscouter.blogspot.in

# Machine Learning-A Problem or Solution

### The article will be divided into different sections as follows:

• Introduction to Machine Learning
• Types of Solutions
• Classification using Naive Bayes

## A brief about Machine Learning

According to the definition by Wikipedia,Β Machine learningΒ is the subfield ofΒ computer scienceΒ that, according toΒ Arthur SamuelΒ in 1959, gives "computers the ability to learn without being explicitly programmed."Β  Machine Learning defines a set of problems that have to be evolved through the data by implying some algorithm. One factor that has to be kept in mind while defining a solution through ML is accuracy. Accuracy is very critical in case you are developing a solution in medical domain(cancer detection).There should be a threshold set for every solution which can be based on risk %age that is acceptable.

## Types of solution

MachineΒ Learning solutions can be broadly classified into three typesΒ

## Machine Learning for Classification problem

Classification is set ofΒ machineΒ learning problems in which an input is classified into different classes which can be either supervised or unsupervised.

Examples of classification problems:

• Sentiment Analysis
• Product Categorization
• Binary Classification on reviews(pos, neg) and much more

## Movie Review Classification using Naive Bayes Algorithm

I will be using nltk i.e is Natural Language Processing Toolkit available in python and movie review corpus that has labelled data for movie reviews classified as positive and negative.

````import` `nltk.classify.util`

`from` `nltk.classify ``import` `NaiveBayesClassifier`

`from` `nltk.corpus ``import` `movie_reviews`

#word_feats will convert the sentence into features

`def` `word_feats(words):`

`Β Β Β Β ``return` `dict``([(word, ``True``) ``for` `word ``in` `words])`

`negids ``=` `movie_reviews.fileids(``'neg'``)`

`posids ``=` `movie_reviews.fileids(``'pos'``)`

```
```

#training data is converted into features

`negfeats ``=` `[(word_feats(movie_reviews.words(fileids``=``[f])), ``'neg'``) ``for` `f ``in` `negids]`

`posfeats ``=` `[(word_feats(movie_reviews.words(fileids``=``[f])), ``'pos'``) ``for` `f ``in` `posids]`

#data is divided into training and testing data

`negcutoff ``=` `len``(negfeats)``*``3``/``4`

`poscutoff ``=` `len``(posfeats)``*``3``/``4`

`trainfeats ``=` `negfeats[:negcutoff] ``+` `posfeats[:poscutoff]`

`testfeats ``=` `negfeats[negcutoff:] ``+` `posfeats[poscutoff:]`

`print` `'train on %d instances, test on %d instances'` `%` `(``len``(trainfeats), ``len``(testfeats))`

```
```

#naivebayes classifier used for training on training data

`classifier ``=` `NaiveBayesClassifier.train(trainfeats)`

`print` `'accuracy:'``, nltk.classify.util.accuracy(classifier, testfeats)`

`classifier.show_most_informative_features()`
`print` `'classified as:'``,classifier.classify("The plot was good,but the characters were not compelling")`
```