DEV Community

Cover image for Talking about Machine Learning (I): Setup
Andrés Baamonde Lozano
Andrés Baamonde Lozano

Posted on

Talking about Machine Learning (I): Setup

Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.

Environment

I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.

mkdir -p talkingaboutml/talkingaboutml
python3 -m virtualenv talkingaboutml/venv
talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib
Enter fullscreen mode Exit fullscreen mode

now, your talkingaboutml dir looks like:

talkingaboutml/
├── talkingaboutml (here we store our examples)
└── venv

Enter fullscreen mode Exit fullscreen mode

First example

On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.

As a First example, i will train a simple classification and run a predict.

We need some imports, datasets, accuracy metric and a linear svc:

from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC
Enter fullscreen mode Exit fullscreen mode

Load dataset, this datasets are already divided (data, target).

iris = datasets.load_iris()
X = iris.data # each register, a iris with features
y = iris.target # classification for each register

feature_number = X.shape[1]
Enter fullscreen mode Exit fullscreen mode

Create classification, train and predict.


clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC'

clf.fit(X, y) # Train

y_pred = clf.predict(X)
accuracy = accuracy_score(y, y_pred)
print(accuracy)
Enter fullscreen mode Exit fullscreen mode

So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.svm.classes import SVC

iris = datasets.load_iris()

X = iris.data # each register, a iris with features
y = iris.target # clasiffication ir each register


feature_number = X.shape[1]

penalties = list(np.arange(0.5,10.0, 0.1))

accs = []

for C in penalties:
    clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC'

    clf.fit(X, y) # Train

    y_pred = clf.predict(X)
    accuracy = accuracy_score(y, y_pred)
    accs.append(accuracy)


# plot the data
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot(penalties, accs, 'r')
plt.show()
Enter fullscreen mode Exit fullscreen mode

penalties result

As we can see, the penalty factor. If it is too large, we have too many support vector and it may cause overfit.

Top comments (0)