DEV Community

Cover image for A Very General Introduction to Keras
Abhisar Shukla
Abhisar Shukla

Posted on

A Very General Introduction to Keras

In this era of deeplearning, it is being applied to every single place to make life of humans easier than before. They are used in our phone cameras, search engines, cancer treatment, nuclear research, quantum physics research and many more places where it just goes without noticing. All the applications using deeplearning use a trained model to make predictions on the previously unseen examples. Models making predictions are first created and trained on a huge training dataset and evaluated on a test dataset which is smaller in size than the training set. An evaluation dataset is also used to adjust the hyper-parameters of the model.


Deeplearning models can be created and trained using many frameworks that are available free and open-source such as:

Each of the above frameworks have an API for python, which is great! But and it is a big BUT that programming directly in these frameworks is somewhat not beginner friendly as it requires you to have a deep understanding of the deeplearning methods, such as:

  • Initialization of the parameters.
  • Size of the weight tensors.
  • Figuring out tensor multiplication.
  • Lots of API docs

and so on.
Now these things are not easy to figure out and it can easily drag you out of the idea of writing a program yourself. This is where Keras comes into picture.


Keras

Keras logo
Keras Logo

Keras is programming framework that is built to run upon other deeplearning frameworks such as the ones' mentioned above. In their own words:

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Now let's just jump into it!

Installation

For installation I would recommend setting up a virtual environment first, that can be done using virtualenv or conda is you use anaconda.

Using virtualenv

virtualenv --system-site-packages -p python3 ~/path/to/folder
source ~/path/to/folder/bin/activate
pip install keras jupyter numpy matplotlib

Using conda

conda create -n keras python=3.6.7 #use any python3 version
conda activate keras
conda install keras jupyter numpy matplotlib

Now if you don't have any backends installed already, keras will use tensorflow backend.
Numpy and Matplotlib are the libraries that will help us in faster matrix/tensor operations and plotting respectively.
Now to show the basics of keras (very basics) we will use the example of fashion-mnist classification problem using a fully-connected neural network architecture. Now if you are unable to understand what these terms mean you should either take a basic deeplearning course or stick around for the fun!

Prerequisites

These are the things you will need to follow along:

  • Basic python
  • Basic deeplearning (for understanding the terminologies)

Now assuming you searched for keras and got here, you probably know both of those things, so lets move on.

Programming

Now assuming you are in the same virtual environment where keras and jupyter are installed, type the following:

jupyter notebook

This will open a new tab running a jupyter server. Jupyter is going to be your best friend for all the machine learning and deeplearning stuff due to its features which allow easy and fast experimentation which is very necessary in the field of deeplearning.
Next click on the new icon and select python3. This will open a new jupyter notebook in a new tab.

Description

Now the network that we will implement here will be:

Input->Flatten->Dense(1024)->Dense(10)
Flatten layer here squishes the 2D image into a 1D vector which is used to fully-connect the input features(in this case the pixel values) to the next layer. Dense layer means that each node in this layer is connected to each and every node of the previous layer(your simple neural network layer). Finally the output layer here predicts the type of clothing which is one out of the ten possible, therefore it is softmax layer. All the hidden layers are relu layers.

Program

Import all the necessary modules that we will need.

import numpy as np
%matplotlib inline
from matplotlib import pyplot as plt
from keras.models import Sequential
from keras.layers import Flatten, Dense
from keras.datasets import fashion_mnist
from keras.utils import to_categorical

Now hit Shift+Enter to execute the cell and create a new cell below the current.
Now its time to load the dataset.

(train_x, train_y), (test_x, test_y) = fashion_mnist.load_data()

This will load the training and test data in train_x, train_y, test_x and test_y. train_x and test_x contain examples(images) while train_y and test_y contain corresponding layer.
In the next cell lets know the shape of these variables.

print(train_x.shape, train_y.shape)
print(test_x.shape, test_y.shape)

From the output we see that training set contains 60000 examples whereas test contains 10000 examples, recall that test set is always smaller that the training set. Also we see that each image is of the size 28*28, i.e. each image is 2D tensor of size 28*28 where each cell contains a grayscale value.
Now to see any of the training example in image form you can do this.

plt.imshow(train_x[10], cmap='gray')

Here we look at 10th training example, you can different numbers in the index or look in the test_x variable.
Now we are at the data pre-processing stage where we transform the data in such a way that it is easier for neural-network to train on these examples.
First we normalize training and test examples by dividing each example by 255 since, max value of any pixel can be 255. This will keep the values between 0 and 1.

train_x = train_x / 255.0
test_x = test_x / 255.0

Now we write two lines of code to one hot encode the training and test labels. One hot encoding is used almost everywhere to improve the performance of the neural network in the classification task. Suppose there is a label

y = 0

Then after one hot encoding it will be

y = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

This allows us to use softmax activation which is the best choice to use when output is from a probability distribution.

train_y = to_categorical(train_y)
test_y = to_categorical(test_y)

Next, its time to create the model discussed at the beginning of the section.

model = Sequential()
model.add(Flatten())
model.add(Dense(1024, activation = 'relu'))
model.add(Dense(10, activation = 'softmax'))

This creates a sequential model meaning each layer is after the layer before it. This creates exactly the same model that we discusses above.
After creating the model we have to compile the model to be able to train it.

model.compile(optimizer='Adam',
              loss = 'categorical_crossentropy',
              metrics = ['accuracy'])

This step defines an optimization algorithm for the model to use, in this case we will use Adam optimizer which used most of the times in deeplearning applications. List of other optimizers such as SGD, RMSprop, etc can be found on keras website also linked above.
categorical_crossentropy is the loss function to use if you are doing a classification task and your neural-network output is more than one.
And metrics 'accuracy' is passed to see the accuracy of each epoch after training and testing. An epoch is single pass through a neural network which consists of a forward pass and a backward pass in the network.
After compiling the network we are ready to Train it. Ah Finally!

model.fit(train_x, train_y, epochs = 10)

We train it for 10 epochs. You can try different numbers to experiment.
After training accuracy of this network is about 91% which is pretty good for a simple and small network like this.
But we should always evaluate our network on test set before becoming too happy about the training accuracy.

model.evaluate(test_x, test_y)

Evaluating on the test set gives about 89% accuracy. Again this test accuracy is pretty good for a simple network like this.
Now there are ways to improve the performance of the network such as

  • using regularization
  • dropout
  • building more deeper network
  • training for a longer time and many more. But that will be a lot for a beginner post to include those methods.

After this

After this you can check some online deeplearning courses on coursera or any other online course whichever makes you understand it better.
I would recommend that you practice your skills regularly, there is huge amount of free datasets available online which you can download and try to train a neural-network on that.

Finally, Thanks for reading!

Top comments (0)