I started on my journey to learn more about data science and AI a little over a year ago as I began my masters and then quit my job. I enrolled full time as a data science student and have enjoyed the decision ever since. When I began my data science classes, most focused on the fundamentals of data storage, management, and analysis. So in my spare time, I have been trying to learn more about AI. One particular thing I find of interest in AI is image recognition and classification. Through this, I have dabbled in different projects to look at live image recognition for movement and person detection. Moving on from this, I have found an interesting in analyzing satellite imagry to gather information from those images. One particular project I have been working on is my proposal for Analyzing Deforestation and Urbanization Using Intel AI Technologies. Through these projects, I have learned new things in data import, setup and pipelines that I thought would be of interest to share.
One of the first things I have been learning while working with images for AI was that image import and setup was not as simple as sending an image to a model. When working with Intel Optimized TensorFlow to train a neural network to recognize images, an image dataset is needed with categories. The first thing I read that struck me as odd at first was that all images should be of the same size, for example (28 by 28 pixels). This is common on real-world image datasets that are being used to train neural networks as the images often come in different sizes. Based on this information, they should be batched into a fixed size. This information was well explained in the TensorFlow Guide for Importing Data, as seen below.
# Reads an image from a file, decodes it into a dense tensor, and resizes it # to a fixed shape. def _parse_function(filename, label): image_string = tf.read_file(filename) image_decoded = tf.image.decode_jpeg(image_string) image_resized = tf.image.resize_images(image_decoded, [28, 28]) return image_resized, label
From this, I was able to understand just how images were preprocessed before placing them into the model. As I continued to research just how images were processed, I also found it interesting to reexamine pixel format. The most common pixel format is a byte image which means each pixel in the 28 by 28 pixels image would range from 0 to 255 where 0 is black and 255 is white, and all values in between range in gray. This means that all images processed are analyzed in grayscale instead of color. More on image transformation can be seen at TensorFlow Images documentation.
After understanding the image cleaning, I looked into data pipelines and decoding images. Moving from the code snippet above, it can be seen that the function reads an image from a file, then decodes that image into a dense tensor before resizing it into the determined 28 by 28 pixels.
# A vector of filenames. filenames = tf.constant(["/var/data/image1.jpg", "/var/data/image2.jpg", ...]) # `labels[i]` is the label for the image in `filenames[i]. labels = tf.constant([0, 37, ...])
This function, as can be seen, takes in a set of images and labels. These images and labels can be saved in tf.constants as below. At first this seemed odd, but through further reading I realized this action of tf.constant(value) was saving both the images and the labels into two constant tensors called filenames and labels.
Through all of the data cleaning, decoding, and pipelining it finally came down to the last two steps of the code which create the TensorFlow dataset from the filenames and labels before mapping the final function. I found this line,
tf.data.Dataset.from_tensor_slices, to be particularly confusing at first glance. Through research, I was able to discern that this line refers to the making of a TensorFlow dataset from the input constant vectors created above. This allows for the division of image files and labels in the dataset once created.
dataset = tf.data.Dataset.from_tensor_slices((filenames, labels)) dataset = dataset.map(_parse_function)
Once created, the
dataset.map function can be called with the parse function created earlier as a parameter. Map allows a transformation across all elements of the dataset and then returns a new dataset containing all transformed items. These items will appear in the same order at the original input. Therefore, when this map function is called, it calls upon the parse function created originally. This parse function was what had done all the image transformation and cleaning for us earlier.
Writing my proposal for Analyzing Deforestation and Urbanization Using Intel AI Technologies many months ago seemed like a small task, but now it has allowed me to push further into AI and image classification. I hope what I have learned thus far has helped you as well!
Cover image sourced from Vexel's Blog
Analyzing Deforestation and Urbanization Using Intel AI Technologies DevMesh
Intel Optimized TensorFlow
TensorFlow Importing Data
TensorFlow Basic Classification
Building an Image Data Pipeline
TensorFlow Dataset Map