## DEV Community is a community of 697,485 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

# Identifying Colours in Images using Python and OpenCV Mohammed Mushahid Qureshi
mediocre programmer and avid gamer

Hi everyone,

I'll try to explain in this post how we can used an unsupervised ML model to find the most used colours in an image.

## KMeans Clustering

KMeans clustering is a clustering algorithm which means it is used to divide the given data into subgroups based such that data in one subgroup or cluster is different from the data in another subgroup or cluster. Clustering is one of the methods used in unsupervised machine learning. This means that the performance of the algorithm is not evaluated by comparing its output to true labels of the data, instead the goal is to investigate the structure of the data by grouping it into clusters or subgroups.

One of the applications of KMeans clustering is that it can be used to group together colours in an image to find the most used colours in a given image.

## Building a model in a Jupyter Notebook or Google Colaboratory

### Imports

We start by importing the libraries we are going to use. We'll be using `matplotlib.pyplot` to generate the pie chart, Opencv to read the image, `KMeans` algorithm from the `sklearn.cluster` package, `rgb2lab` to covert image colours to lab and `deltaE_cie76` to compare them. We will also use the os module to combine paths when reading files and the `Counter` from collections library to extract the count.
The imports should look like this:

``````import os
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np
import cv2
from collections import Counter
from skimage.color import rgb2lab, deltaE_cie76
``````

Since we are working in Jupyter notebook we also need to add `%matplotlib inline` to tell matplotlib to display the plots inside the notebook.

### Using OpenCV to read the image

Images can be read using the method `cv2.read()` by passing the complete path as an argument. We can use pyplot to then plot this image.

``````image = cv2.read(os.path.join('path_to_image', 'image.jpg')
plt.imshow(image)
``````

At this point you'll notice that there is something wrong with colour of the image that is plotted. This is because OpenCV reads the image in Blue Green Red colours and to view the actual image we need to convert to Red Green Blue using the method `cv2.CvtColor)`.

``````image = cv2.CvtColor(image, cv2.COLOR_BGR2RGB)
plt.imshow(image)
``````

### Colour Identification

We first need a function to convert RGB values to hex colour codes so we can use these as labels in our pie chart. We are using string formatting for this which
displays our integers as hexadecimal numbers. We could also use the method `binascii.hexlify()`.

``````def RGB2HEX(color):
return "#{:02x}{:02x}{:02x}".format(int(color), int(color), int(color))
``````

#### Getting colours from an image

First we'll reduce the image size to reduce the execution time of our program. We also need to convert the shape of the array containing the image to something that we could pass to our cluster so convert the array to 2 dimensions.

``````mod_img = cv2.resize(img, (600, 400), interpolation = cv2.INTER_AREA)
mod_img = mod_img.reshape(mod_img.shape*mod_img.shape, 3)
``````

#### Now we implement KMeans

``````number_of_colours = 8
clf = KMeans(n_clusters = number_of_colours)
labels = clf.fit_predict(modified_image)
``````

KMeans algorithm creates clusters based on the supplied count of clusters which in our case will be the top colours. We use `fit` and `predict` on the same image and extract the prediction into the variable `labels`.

#### Counting the colours and plotting the Pie chart

We use `Counter` to get the count of all labels i.e. how many times each value is present in `labels`. To find the colours, we use `clf.cluster_centers_` where all the centroids of all clusters are stored. We iterate over through the keys in `counts` and get `ordered_colours` which is way of knowing which data belongs to which cluster(here we use that to group similar colours to provide better result) and now we have the values for how many times a colour is present in the image. Finally we convert the values to hex codes and store them in `hex_colours`.

We plot a pie chart with the values from the `counts` and the `labels` from `hex_colours`. We also get the colours for the pie chart from `hex_colors`

``````counts = Counter(labels)
center_colours = clf.cluster_centers_
ordered_colours =  [center_colors[i] for i in counts.keys()]
hex_colors = [RGB2HEX(ordered_colors[i]) for i in counts.keys()]
plt.figure(figsize = (8, 6))
plt.pie(counts.values(), labels = hex_colors, colors = hex_colors)
``````

### Bringing everything together into a function

This function accepts three arguments: path to the image, no of colours to be identifies and a Boolean `show_chart` to display the pie chart.

``````def get_colours(img_path, no_of_colours, show_chart):
img = get_img(img_path)
mod_img = cv2.resize(img, (600, 400), interpolation = cv2.INTER_AREA)
mod_img = mod_img.reshape(mod_img.shape*mod_img.shape, 3)

#Define the clusters
clf = KMeans(n_clusters = no_of_colours)
labels = clf.fit_predict(mod_img)

counts = Counter(labels)
counts = dict(sorted(counts.items()))

center_colours = clf.cluster_centers_
ordered_colours = [center_colours[i] for i in counts.keys()]
hex_colours = [RGB2HEX(ordered_colours[i]) for i in counts.keys()]
rgb_colours = [ordered_colours[i] for i in counts.keys()]

if (show_chart):
plt.figure(figsize = (8, 6))
plt.pie(counts.values(), labels = hex_colours, colors = hex_colours)
return
else:
return rgb_colours
``````

You can find the google colab notebook here: https://github.com/mushahidq/py_colour_identifier/blob/main/colour_identifier.ipynb

I'll soon be making another post on how to turn this into a web app and deploy to heroku.

This is my first post so some feedback would be much appreciated.