Handwritten Digit Recognition Using Convolutional Neural Networks

Frank Rosner on May 28, 2018

Introduction In this blog post I want to share a small application I developed that classifies images of hand written digits, together... [Read Full]
markdown guide
 

Very well written! Loved the explanations for the pooling and batch normalization. Does dl4j take advantage of GPUs?

I did a project on recognizing numbers on houses from the Google street view house numbers dataset. I used keras for training as it has ready-made network architectures for the famous papers like resnet, vgg16 and xception. These are trained on datasets like imagenet or CIFAR. We have to replace the final layer (for multi-class classification) with our layer. Keras also has a image data generator where you can slightly rotate, shear, scale and blur images to avoid overfitting and increase robustness of the model.

One issue I faced was false positives e.g recognizing a door handle as a '1' as the dataset itself just has labels for the digits. I had to create a separate 'negative' class and extract random patches from the training images. With that, the accuracy went up to 97%. One more thing I found was that pre-processing the images makes a big difference. In my project, mean subtraction, normalization and a light gaussian blur reduced the training time and increased the accuracy.

 

Raunak

It works very well with GPUs. We use them all the time.

DL4J comes with an image pre-processing transform API that you can see in action here:

github.com/deeplearning4j/dl4j-exa...

DL4J also comes with many of the famous networks many of which actually come from Keras using the model import feature. Just add deeplearning4j-zoo to your project and use the TransferLearning class to edit the graph like so:

github.com/deeplearning4j/dl4j-exa...

 

Thanks Eduardo. The model import from keras looks very useful.

 

Hi Raunak,

glad you enjoyed the post :) I tried to make it not too theoretical but without some intuition about the math I find it hard to understand how to apply it.

DL4J uses ND4J under the hood for numerical computations on the tensors. ND4J supports native libraries for many different platforms. If you want to use NVIDIA GPUs you can simply use the nd4j-cuda-* dependency. I haven't tried it out, yet, though.

I haven't used keras, yet but I'm planning to check it out later. I also want to try a more sophisticated problem.

I completely agree with your last point about things that the network has never seen before. With CNNs it's very important to pick the right training data and have well-labeled data. In my example I was only rescaling the colors to (0,1) but didn't do any other preprocessing steps. Do you know if, similar to the convolution effect, there are networks that can learn some parts of the preprocessing as well? That would be interesting.

Thanks for your feedback!

 

Not sure of networks learning the pre-processing as most of the image pipelines I have seen try a lot of hit-and-miss steps with regards to pre-processing. I have seen people try thresholding images, use gradient or edge images, use RGB vs grayscale vs HSL. I think there is a lot of variability in pre-processing which makes it difficult for a network to learn. This is one case where having knowledge of your specific set and some knowledge of computer vision helps, otherwise, we will require a very large number of training images. If we have a small number of images, we can 'augment' the dataset by using image data generators which slightly change images by rotating/resizing/blurring/distorting,etc.

There are also LSTM networks which have a concept of memory but they are used more for speech recognition and time series. I haven't worked with these yet.

Got LSTM on my To-Do list, already. Definately going to check them out!

Thanks Raunak!

 

One thing you want to be careful with is that DL4J models aren't threadsafe. You want to wrap them inside the ParallelInference wrapper which has a few knobs for maximizing performance. You can see how to use it from the unit tests:

github.com/deeplearning4j/deeplear...

 

Thanks for letting me know. I was aware that the models aren't thread safe that's why I wrapped it around a synchronized block (as mentioned in the post). Definately going to take a look into the ParallelInference wrapper. Thanks for the link!

 

Could you please explain how our image stream can be converted to the image size and format that the model expects ?

 

I can certainly try. Would you be able to provide me with more details on your image stream? Are you also using DL4J?

code of conduct - report abuse