About this memorandum
Recently, I had a chance to experience deep learning and image/video processing at work, but there were many things I didn't understand about how to touch parameters and amplify data, so I decided to study video processing from scratch.
Color images and grayscale images
- Grayscale image: pixel value represents brightness
- RGB color image: the pixel value represents the brightness of each RGB (for clarity, see the image below)
- 1 pixel: 8bit×3=24bit(3Byte)
However, in reality, the correct understanding is that it consists of three R, G and B grayscale images.
Since the car is originally red, the pixel value in the R channel on the left is close to 255, and it appears to be white in the grayscale image. On the other hand, the G and B channels have smaller pixel values and appear to be black.
Code
im = imread([file_path])
imshow(im)
plt.title("original RGB image")
plt.show()
r_channel = im[:, :, 0]
g_channel = im[:, :, 1]
b_channel = im[:, :, 2]
fig = plt.figure(figsize=(15,3))
for i, c in zip(range(3), 'RGB'):
ax = fig.add_subplot(1, 3, i + 1)
imshow(im[:, :, i], vmin=0, vmax=255)
plt.colorbar()
plt.title(f'{c} channel')
plt.show()
RGB&BGR
The only difference is the interpretation of the data, but be careful.
RGB
- Used in many textbook descriptions.
- Many image processing libraries also use this format.
- skimage, matplotlib for python
BGR
- Also often used
- opencv (python, C/C++)
- COLORREF on Windows (0x00bbggrr in hexadecimal)
- Hardware
A common mistake is that when images loaded with opencv (BGR) are displayed with scikit-image (RGB), the red and blue are reversed.
Code (example of mistake)
im_BGR = cv2.imread(INPUT_DIR + 'IMG-4034.JPG') # OpenCV
imshow(im_BGR) # matplotlibのimshowはRGBを仮定
plt.title('show BGR image as RGB image')
plt.axis('off')
plt.show()
There are several ways to fix this, including the following
Code:RGB and GBR conversion
### use the built-in functions
im_BGR_to_RGB = cv2.cvtColor(im_BGR, cv2.COLOR_BGR2RGB)
imshow(im_BGR_to_RGB)
plt.title('show RGB-converted BGR image as RGB image')
plt.axis('off')
plt.show()
### not use the built-in functions1
im_BGR_to_RGB = im_BGR[:, :, ::-1]
imshow(im_BGR_to_RGB)
plt.title('show RGB-converted BGR image as RGB image')
plt.axis('off')
plt.show()
### not use the built-in functions1(explanation process of 1 above.)
im_BGR_to_RGB = np.zeros_like(im_BGR)
im_BGR_to_RGB[:, :, 0] = im_BGR[:, :, 2]
im_BGR_to_RGB[:, :, 1] = im_BGR[:, :, 1]
im_BGR_to_RGB[:, :, 2] = im_BGR[:, :, 0]
imshow(im_BGR_to_RGB)
plt.title('show RGB-converted BGR image as RGB image')
plt.axis('off')
plt.show()
How to create a grayscale image.
There is no wrong or right way to do this, just different standards.
However, all methods look almost the same, but the values are different, so it is necessary to recognize when working together.
use the built-in functions
Divide the sum of the RGB values by three
Standard: PAL/NTSC
Standard: HDTV (same as built-in functions)
Code
im = imread(INPUT_DIR + 'IMG-4034.JPG')
imshow(im)
plt.title("original RGB image")
plt.show()
# Using the built-in rgb2gray function;gray = 0.2125 R + 0.7154 G + 0.0721 B
im_gray1 = rgb2gray(im)
imshow(im_gray1, vmin=0, vmax=1) # 型はfloat,範囲は[0,1]になる
plt.colorbar()
plt.title("rgb2gray min {0} max {1}".format(im_gray1.min(), im_gray1.max() ))
plt.show()
# The average of RGB is used as a grayscale image. First convert to float (the range will be [0,255]), then convert to uint8 for display.
im_gray2 = (im[:,:,0].astype(float) +
im[:,:,1].astype(float) +
im[:,:,2].astype(float)) / 3
imshow(im_gray2, vmin=0, vmax=255)
plt.colorbar()
plt.title("(R+B+G)/3 min {0:.2f} max {1:.2f}".format(im_gray2.min(), im_gray2.max() ))
plt.show()
# The weighted average of RGB is used as the grayscale image.
# https://en.wikipedia.org/wiki/Grayscale#Luma_coding_in_video_systems
im_gray3 = (0.299 * im[:,:,0].astype(float) +
0.587 * im[:,:,1].astype(float) +
0.114 * im[:,:,2].astype(float))
imshow(im_gray3, vmin=0, vmax=255)
plt.colorbar()
plt.title("$\gamma'$ of PAL and NTSC min {0:.2f} max {1:.2f}".format(im_gray3.min(), im_gray3.max() ))
plt.show()
# The weighted average of RGB is used as a grayscale image. The weight coefficients vary depending on the standard.
# https://en.wikipedia.org/wiki/Grayscale#Luma_coding_in_video_systems
# This is what rgb2gray() uses.http://scikit-image.org/docs/dev/api/skimage.color.html#skimage.color.rgb2gray
im_gray4 = (0.2126 * im[:,:,0].astype(float) +
0.7152 * im[:,:,1].astype(float) +
0.0722 * im[:,:,2].astype(float))
imshow(im_gray4, vmin=0, vmax=255)
plt.colorbar()
plt.title("$\gamma'$ of HDTV min {0:.2f} max {1:.2f}".format(im_gray4.min(), im_gray4.max() ))
plt.show()
Histogram
A graph showing the frequency distribution of pixel values.
The third picture shows a drawing of Metamon on a whiteboard.
The peak of the Metamon drawing can be seen around 0.5~0.8, although it is a little uneven depending on the amount of ink in the marker (the amount of force used when drawing, etc.).
In addition, the area where the PC display on the other side of the whiteboard is reflected (to the left of Metamon's mouth) is quite white.This is thought to be the peak around 0.
im_files = ['file_path1', 'file_path2', 'file_path3']
for file in im_files:
im = imread(file)[:,:,:3] # In the case of RGBA, extract only RGB
fig = plt.figure(figsize=(20,3))
ax = fig.add_subplot(1, 3, 1)
im = rgb2gray(im) # Range;[0,1]
imshow(im)
plt.axis('off')
bins = 256
ax = fig.add_subplot(1, 3, 2)
freq, bins = histogram(im)
plt.plot(bins, freq)
plt.xlabel("intensity")
plt.ylabel("frequency")
plt.title('histogram (linear)')
plt.xlim(0,1)
ax = fig.add_subplot(1, 3, 3)
freq, bins = histogram(im)
plt.plot(bins, freq)
plt.xlabel("intensity")
plt.ylabel("log frequency")
plt.yscale('log')
plt.title('histogram (log)')
plt.xlim(0,1)
plt.show();
Top comments (0)