DEV Community

Cover image for How to prepare your custom images dataset for a deep learning model
Juma Shafara
Juma Shafara

Posted on • Originally published at blog.jumashafara.com

How to prepare your custom images dataset for a deep learning model

Almost all the deep-learning tutorials out there teach how to build models using already set or inbuilt datasets, but what if you want to train a model to recognize you and maybe your family's faces, a dataset for that purpose is not readily available on the internet.
The ability to organize and build your own image dataset to solve your problem is a demonstration of expertise.
In this article, you will learn how to prepare your own dataset of raw images, which you can then use for your own image classification/computer vision projects.

Steps

  1. Gather and organize your images.

For example, if I want to build a model that recognizes humans(faces) and nonhumans, I put all the human images in one folder and all the nonhuman images in another folder. The two folders are then put into one main folder.

Image description

  1. Open a jupyter notebook and import the cv2, PIL, and pathlib.
import cv2
import PIL
import pathlib
import numpy as np
Enter fullscreen mode Exit fullscreen mode

You can find out more about cv2 here, or about PIL here and about pathlib here

  1. Specify the route to the folder and create a Path object from it
images_dir = "/path/to/your/images_directory"

images_dir = pathlib.Path(images_dir)
Enter fullscreen mode Exit fullscreen mode
  1. Create two objects, one containing the image classes lists and another containing the class labels
class_images = {
    'human': list(images_dir.glob('human/*')),
    "nonhuman": list(images_dir.glob("nonhuman/*"))
}

class_labels = {
    "human": 0,
    "nonhuman": 1
}
Enter fullscreen mode Exit fullscreen mode
  1. Initialize X and y to empty python lists. Convert each of the images to a numpy array, resize it to a standard size and append it to the X list. Also, append the corresponding image class label to the y list.
X, y = [], []

for image_name, images in class_images.items():
  print(class_images.items())
  for image in images:
    img = cv2.imread(str(image))
    resized_img = cv2.resize(img, (256, 256))
    X.append(resized_img)
    y.append(class_labels[image_name])
Enter fullscreen mode Exit fullscreen mode
  1. Finally convert X and y to numpy arrays and split each into train and test sets
X = np.array(X)
y = np.array(y)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
Enter fullscreen mode Exit fullscreen mode

There you go, now you have your custom image train and test datasets, however, if you want your model to perform better and learn quicker, there are a few other things you may want to do for your image datasets, for example, rescaling and augmentation. These are however out of the scope of this article, if you are interested in that subscribe to my newsletter so that you don't miss out on the upcoming article that will cover how to do them.

Also, follow me on twitter(juma_shafara) where I share other programming stuff.

Top comments (0)