张家琪 Zhang Jiaqi

Posted on Jun 9, 2023

Understanding DCT and Quantization in JPEG compression

#python #dip #jpeg #programming

JPEG is a widely used lossy compression standard method specifically designed for images. It was developed by the Joint Photographic Experts Group.

JPEG compression can be roughly divided into five steps: 1) Color space transformation, 2) Downsampling, 3) Discrete Cosine Transform (DCT), 4) Quantization, and 5) Entropy coding.

Here, I will mainly analyze the third step, "Discrete Cosine Transform," and the fourth step, "Quantization," and provide the corresponding Python code implementation.

Discrete Cosine Transform (DCT)

Discrete Cosine Transform (DCT) is commonly used in signal processing and image processing as a method to transform an image from the spatial domain to the frequency domain. The idea behind DCT is that any function can be represented by the summation of cosine functions. DCT is a reversible process and inherently lossless.

DCT can be accomplished using a DCT transformation matrix.

T=CFC^T

Among them,

F

represents the original image,

T

represents the image after DCT. Assuming the matrix size is

N \times N

, the DCT matrix

C

is as follows:

C(u,v)=\begin{cases} \sqrt{\frac{1}{N}} &u=0,0\le v \le N-1 \\ \sqrt{\frac{2}{N}}cos\frac{(2v+1)u\pi}{2N} & 1\le u \le N-1,0\le v \le N-1 \end{cases}

If the horizontal axis is

u

, the vertical axis is

v

, the matrix representation of

C

is:

C(u,v)=\begin{bmatrix} \sqrt{\frac{1}{N}} & \sqrt{\frac{1}{N}} & \cdots & \sqrt{\frac{1}{N}} \\ \sqrt{\frac{2}{N}}cos\frac{\pi}{2N} & \sqrt{\frac{2}{N}}cos\frac{3\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)\pi}{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \sqrt{\frac{2}{N}}cos\frac{(N-1)\pi}{2N} & \sqrt{\frac{2}{N}}cos\frac{3(N-1)\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)(N-1)\pi}{2N} \\ \end{bmatrix}

Transpose matrix of

C

is:

C^T(u,v)=\begin{bmatrix} \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(N-1)\pi}{2N} \\ \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{3\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{3(N-1)\pi}{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{(2N-1)\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)(N-1)\pi}{2N}\\ \end{bmatrix}

Basis functions above can be illustrated as images, known as the basis image. The specific method is as follows: For an NxN DCT matrix, by multiplying the nth column vector of

C

with the mth row vector of

C^T

, a total of

N^2

matrices can be obtained. These matrices are then concatenated according to their positions (m,n), resulting in a basis image of size

N^2 \times N^2

. Any image can be composed using these cosine transform combinations. In fact, DCT is to calculate the contribution of each cosine waves and get what we call "coefficient".



import numpy as np
import matplotlib.pyplot as plt
import math
import cv2
import skimage.io as io

# DCT matrix
N=8 # matrix shape: 8*8
DCT=np.zeros((N,N))
for m in range(N):
    for n in range(N):
        if m==0:
            DCT[m][n]=math.sqrt(1/N)
        else:
            DCT[m][n]=math.sqrt(2/N)*math.cos((2*n+1)*math.pi*m/(2*N))

# DCT basis image
basis=np.zeros((N*N,N*N))
for m in range(N):
    for n in range(N):
        pos_m=m*N
        pos_n=n*N
        DCT_v=DCT[m,:].reshape(-1,1)
        DCT_T_h=DCT.T[:,n].reshape(-1,N)
       basis[pos_m:pos_m+N,pos_n:pos_n+N]=np.matmul(DCT_v,DCT_T_h)

# Center values
basis+=np.absolute(np.amin(basis))
scale=np.around(1/np.amax(basis),decimals=3)
for m in range(basis.shape[0]):
    for n in range(basis.shape[1]):
        basis[m][n]=np.around(basis[m][n]*scale,decimals=3)

# Show basis image
plt.figure(figsize=(4,4))
plt.gray()
plt.axis('off')
plt.title('DCT Basis Image')
plt.imshow(basis,vmin=0)

This is the basis image of an 8x8 DCT matrix, with a size of 64x64. By observing the image, we can notice that the horizontal frequencies increase from left to right, while the vertical frequencies increase from top to bottom. The constant basis function in the top left corner is commonly referred to as the DC (Direct Current) basis function, and the corresponding DCT coefficient is known as the DC coefficient.

Any image can be composed by overlaying 64 cosine transformations from this reference image. Since we have set the size of the DCT matrix to be 8x8, we need to divide the image into several small 8x8 squares, referred to as blocks. The image is processed in units of blocks. Below, we will demonstrate 8 sets of DCT matrices along with their corresponding blocks.



# DCT matrix
blocks=np.zeros((8*8,8))
for i in range(8):
    blocks[i*8][i]=1

# IDCT -> Original images
blocks_idct=np.zeros((8*8,8))
for i in range(8):
    block=blocks[i*8:i*8+8][:]
    data=cv2.idct(block)
    blocks_idct[i*8:i*8+8][:]=data

# Show DCT matrix
plt.figure(figsize=(16,3))
for i in range(8):
    pos='18'+str(i+1)
    pos=int(pos)
    plt.subplot(pos)
    block=blocks[i*8:i*8+8][:]
    plt.gray()
    plt.axis('off')
    plt.imshow(block,vmin=0)



# Show original images
plt.figure(figsize=(16,3))
for i in range(8):
    pos='18'+str(i+1)
    pos=int(pos)
    plt.subplot(pos)
    block_idct=blocks_idct[i*8:i*8+8][:]
    plt.gray()
    plt.xticks([])
    plt.yticks([])
    plt.imshow(block_idct,vmin=0)

Let's assume that the original images above are numbered from left to right as 1-8. We can see that Image 1 is exactly the same as the top-left small square in the DCT basis image, which means its DCT matrix only has weights in the top-left corner, and the weights are 0 everywhere else. Image 2 is identical to the small square in the first row and second column of the DCT basis image, so its DCT matrix shows weights only at (0,1), and the weights are 0 elsewhere. This pattern continues for the other images. This is the essence of the DCT.

The basic idea of JPEG compression

DCT itself is lossless, and its purpose is to transform the image from the spatial domain to the frequency domain. To achieve compression, we need to introduce a step between DCT and IDCT. The human eye is less sensitive to high-frequency information, so the compression method involves discarding high-frequency information from the image. In the DCT matrix, the information in the top-left corner is high-frequency, while the information in the bottom-right corner is low-frequency. Below, I will demonstrate the basic idea of JPEG compression by selectively retaining 1, 3, and 10 low-frequency components.



# Read image
img=io.imread("img3.jpg",as_gray=True)

# Obtaining a mask through zigzag scanning
def z_scan_mask(C,N):
    mask=np.zeros((N,N))
    start=0
    mask_m=start
    mask_n=start
    for i in range(C):
        if i==0:
            mask[mask_m,mask_n]=1
        else:
            # If even, move upward to the right
            if (mask_m+mask_n)%2==0:
                mask_m-=1
                mask_n+=1
                # If it exceeds the upper boundary, move downward
                if mask_m<0:
                    mask_m+=1
                # If it exceeds the right boundary, move left
                if mask_n>=N:
                    mask_n-=1
            # If odd, move downward to the left
            else:
                mask_m+=1
                mask_n-=1
                # If it exceeds the lower boundary, move upward
                if mask_m>=N:
                    mask_m-=1
                # If it exceeds the left boundary, move right
                if mask_n<0:
                    mask_n+=1
            mask[mask_m,mask_n]=1
    return mask

# overlaying the mask, discarding the high-frequency components
def Compress(img,mask,N):
    img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N))
    for m in range(0,img_dct.shape[0],N):
        for n in range(0,img_dct.shape[1],N):
            block=img[m:m+N,n:n+N]
            # DCT
            coeff=cv2.dct(block)
            # IDCT, but only the parts of the image where the mask has a value of 1 are retained
            iblock=cv2.idct(coeff*mask)
            img_dct[m:m+N,n:n+N]=iblock
    return img_dct

# Images keeping only 1, 3, and 10 low-frequency coefficients
plt.figure(figsize=(16,4))
plt.gray()
plt.subplot(141)
plt.title('Original image')
plt.imshow(img)
plt.axis('off')

plt.subplot(142)
plt.title('Keep 1 coefficient')
plt.imshow(Compress(img,z_scan_mask(1,8),8))
plt.axis('off')

plt.subplot(143)
plt.title('Keep 3 coefficients')
plt.imshow(Compress(img,z_scan_mask(3,8),8))
plt.axis('off')

plt.subplot(144)
plt.title('Keep 10 coefficients')
plt.imshow(Compress(img,z_scan_mask(10,8),8))
plt.axis('off')
plt.show()

For the images above, the more coefficients you retain, the better the image quality. In the DCT matrix, the top-left corner's DC component contains most of the information. If you only keep the DC component from the top-left corner, the entire block will be represented by the DC component alone, resulting in the entire block having a single value. By retaining more coefficient information, more cosine components are overlaid, resulting in more details in the image.

Quantization

The concept of quantization is similar to the second step. The quantization matrix is user-defined, where smaller quantization coefficients retain more information from the image, while larger quantization coefficients retain less information. For example, consider a 2x2 DCT matrix:

C=\begin{bmatrix} 211 & 22 \\ 13 & 5 \end{bmatrix}

If the quantization matrix is:

Q=\begin{bmatrix} 6 & 6 \\ 6 & 6 \end{bmatrix}

If we divide $C$ by $Q$ , round the resulting values, and then multiply them back by $Q$ , we can obtain the quantized matrix $C_q$ .

C_q=\begin{bmatrix} 210 & 24 \\ 12 & 0 \end{bmatrix}

Indeed, we can observe that the value 5 in

C

becomes 0 after quantization. If we reduce

Q

, for example, to 5, the value 5 will not be discarded. Conversely, if we increase

Q

, it is possible that higher values, such as 13, may also be discarded. In software applications like Photoshop and other image processing tools, adjusting the "quality" setting corresponds to modifying the quantization coefficients. The quantization matrix can have different values at each position.



# Define quantization matrix
q_mat=np.array([[16,11,10,16,24,40,51,61],
                [12,12,14,19,26,58,60,55],
                [14,13,16,24,40,57,69,56],
                [14,17,22,29,51,87,80,62],
                [18,22,37,56,68,109,103,77],
                [24,35,55,64,81,104,113,92],
                [49,64,78,87,103,121,120,101],
                [72,92,95,98,112,100,103,99]])

# Quantization
def Compress_2(img,N):
    img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N))
    for m in range(0,img_dct.shape[0],N):
        for n in range(0,img_dct.shape[1],N):
            block=img[m:m+N,n:n+N]
            # DCT
            coeff=cv2.dct(block)
            # Quantization
            q_coeff=np.around(coeff*255/q_mat)
            # Inverse quantization
            iq_coeff=q_coeff*q_mat
            # IDCT
            iblock=cv2.idct(iq_coeff)
            img_dct[m:m+N,n:n+N]=iblock
    return img_dct

# Show Images
plt.figure(figsize=(8,4))
plt.gray()
plt.subplot(121)
plt.title('Original')
plt.imshow(img)
plt.axis('off')

plt.subplot(122)
plt.title('Quantified')
plt.imshow(Compress_2(img,8))
plt.axis('off')
plt.show

DEV Community

Understanding DCT and Quantization in JPEG compression

Discrete Cosine Transform (DCT)

The basic idea of JPEG compression

Quantization

Top comments (0)

Read next

Un senior destaca por su forma de pensar, no de codear

How I Improved My Productivity as a Developer in 30 Days 🚀

AI Search Capabilities Hit Roadblock: Transformers Struggle with Large-Scale Path Finding

Errors as a learning