DEV Community

张家琪 Zhang Jiaqi
张家琪 Zhang Jiaqi

Posted on

Understanding DCT and Quantization in JPEG compression

JPEG is a widely used lossy compression standard method specifically designed for images. It was developed by the Joint Photographic Experts Group.

JPEG compression can be roughly divided into five steps: 1) Color space transformation, 2) Downsampling, 3) Discrete Cosine Transform (DCT), 4) Quantization, and 5) Entropy coding.

Here, I will mainly analyze the third step, "Discrete Cosine Transform," and the fourth step, "Quantization," and provide the corresponding Python code implementation.

Discrete Cosine Transform (DCT)

Discrete Cosine Transform (DCT) is commonly used in signal processing and image processing as a method to transform an image from the spatial domain to the frequency domain. The idea behind DCT is that any function can be represented by the summation of cosine functions. DCT is a reversible process and inherently lossless.

DCT can be accomplished using a DCT transformation matrix.

T=CFCTT=CFC^T

Among them, FF represents the original image, TT represents the image after DCT. Assuming the matrix size is N×NN \times N , the DCT matrix CC is as follows:
C(u,v)={1Nu=0,0vN12Ncos(2v+1)uπ2N1uN1,0vN1 C(u,v)=\begin{cases} \sqrt{\frac{1}{N}} &u=0,0\le v \le N-1 \\ \sqrt{\frac{2}{N}}cos\frac{(2v+1)u\pi}{2N} & 1\le u \le N-1,0\le v \le N-1 \end{cases}

If the horizontal axis is uu , the vertical axis is vv , the matrix representation of CC is:
C(u,v)=[1N1N1N2Ncosπ2N2Ncos3π2N2Ncos(2N1)π2N2Ncos(N1)π2N2Ncos3(N1)π2N2Ncos(2N1)(N1)π2N] C(u,v)=\begin{bmatrix} \sqrt{\frac{1}{N}} & \sqrt{\frac{1}{N}} & \cdots & \sqrt{\frac{1}{N}} \\ \sqrt{\frac{2}{N}}cos\frac{\pi}{2N} & \sqrt{\frac{2}{N}}cos\frac{3\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)\pi}{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \sqrt{\frac{2}{N}}cos\frac{(N-1)\pi}{2N} & \sqrt{\frac{2}{N}}cos\frac{3(N-1)\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)(N-1)\pi}{2N} \\ \end{bmatrix}

Transpose matrix of CC is:
CT(u,v)=[1N2Ncosπ2N2Ncos(N1)π2N1N2Ncos3π2N2Ncos3(N1)π2N1N2Ncos(2N1)π2N2Ncos(2N1)(N1)π2N] C^T(u,v)=\begin{bmatrix} \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(N-1)\pi}{2N} \\ \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{3\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{3(N-1)\pi}{2N} \\ \vdots & \vdots & \ddots & \vdots \\ \sqrt{\frac{1}{N}} & \sqrt{\frac{2}{N}}cos\frac{(2N-1)\pi}{2N} & \cdots & \sqrt{\frac{2}{N}}cos\frac{(2N-1)(N-1)\pi}{2N}\\ \end{bmatrix}

Basis functions above can be illustrated as images, known as the basis image. The specific method is as follows: For an NxN DCT matrix, by multiplying the nth column vector of CC with the mth row vector of CTC^T , a total of N2N^2 matrices can be obtained. These matrices are then concatenated according to their positions (m,n), resulting in a basis image of size N2×N2N^2 \times N^2 . Any image can be composed using these cosine transform combinations. In fact, DCT is to calculate the contribution of each cosine waves and get what we call "coefficient".
import numpy as np
import matplotlib.pyplot as plt
import math
import cv2
import skimage.io as io

# DCT matrix
N=8 # matrix shape: 8*8
DCT=np.zeros((N,N))
for m in range(N):
    for n in range(N):
        if m==0:
            DCT[m][n]=math.sqrt(1/N)
        else:
            DCT[m][n]=math.sqrt(2/N)*math.cos((2*n+1)*math.pi*m/(2*N))

# DCT basis image
basis=np.zeros((N*N,N*N))
for m in range(N):
    for n in range(N):
        pos_m=m*N
        pos_n=n*N
        DCT_v=DCT[m,:].reshape(-1,1)
        DCT_T_h=DCT.T[:,n].reshape(-1,N)
       basis[pos_m:pos_m+N,pos_n:pos_n+N]=np.matmul(DCT_v,DCT_T_h)

# Center values
basis+=np.absolute(np.amin(basis))
scale=np.around(1/np.amax(basis),decimals=3)
for m in range(basis.shape[0]):
    for n in range(basis.shape[1]):
        basis[m][n]=np.around(basis[m][n]*scale,decimals=3)

# Show basis image
plt.figure(figsize=(4,4))
plt.gray()
plt.axis('off')
plt.title('DCT Basis Image')
plt.imshow(basis,vmin=0)
Enter fullscreen mode Exit fullscreen mode

Image description

This is the basis image of an 8x8 DCT matrix, with a size of 64x64. By observing the image, we can notice that the horizontal frequencies increase from left to right, while the vertical frequencies increase from top to bottom. The constant basis function in the top left corner is commonly referred to as the DC (Direct Current) basis function, and the corresponding DCT coefficient is known as the DC coefficient.

Any image can be composed by overlaying 64 cosine transformations from this reference image. Since we have set the size of the DCT matrix to be 8x8, we need to divide the image into several small 8x8 squares, referred to as blocks. The image is processed in units of blocks. Below, we will demonstrate 8 sets of DCT matrices along with their corresponding blocks.

# DCT matrix
blocks=np.zeros((8*8,8))
for i in range(8):
    blocks[i*8][i]=1

# IDCT -> Original images
blocks_idct=np.zeros((8*8,8))
for i in range(8):
    block=blocks[i*8:i*8+8][:]
    data=cv2.idct(block)
    blocks_idct[i*8:i*8+8][:]=data

# Show DCT matrix
plt.figure(figsize=(16,3))
for i in range(8):
    pos='18'+str(i+1)
    pos=int(pos)
    plt.subplot(pos)
    block=blocks[i*8:i*8+8][:]
    plt.gray()
    plt.axis('off')
    plt.imshow(block,vmin=0)
Enter fullscreen mode Exit fullscreen mode

Image description

# Show original images
plt.figure(figsize=(16,3))
for i in range(8):
    pos='18'+str(i+1)
    pos=int(pos)
    plt.subplot(pos)
    block_idct=blocks_idct[i*8:i*8+8][:]
    plt.gray()
    plt.xticks([])
    plt.yticks([])
    plt.imshow(block_idct,vmin=0)
Enter fullscreen mode Exit fullscreen mode

Image description

Let's assume that the original images above are numbered from left to right as 1-8. We can see that Image 1 is exactly the same as the top-left small square in the DCT basis image, which means its DCT matrix only has weights in the top-left corner, and the weights are 0 everywhere else. Image 2 is identical to the small square in the first row and second column of the DCT basis image, so its DCT matrix shows weights only at (0,1), and the weights are 0 elsewhere. This pattern continues for the other images. This is the essence of the DCT.

The basic idea of JPEG compression

DCT itself is lossless, and its purpose is to transform the image from the spatial domain to the frequency domain. To achieve compression, we need to introduce a step between DCT and IDCT. The human eye is less sensitive to high-frequency information, so the compression method involves discarding high-frequency information from the image. In the DCT matrix, the information in the top-left corner is high-frequency, while the information in the bottom-right corner is low-frequency. Below, I will demonstrate the basic idea of JPEG compression by selectively retaining 1, 3, and 10 low-frequency components.

# Read image
img=io.imread("img3.jpg",as_gray=True)

# Obtaining a mask through zigzag scanning
def z_scan_mask(C,N):
    mask=np.zeros((N,N))
    start=0
    mask_m=start
    mask_n=start
    for i in range(C):
        if i==0:
            mask[mask_m,mask_n]=1
        else:
            # If even, move upward to the right
            if (mask_m+mask_n)%2==0:
                mask_m-=1
                mask_n+=1
                # If it exceeds the upper boundary, move downward
                if mask_m<0:
                    mask_m+=1
                # If it exceeds the right boundary, move left
                if mask_n>=N:
                    mask_n-=1
            # If odd, move downward to the left
            else:
                mask_m+=1
                mask_n-=1
                # If it exceeds the lower boundary, move upward
                if mask_m>=N:
                    mask_m-=1
                # If it exceeds the left boundary, move right
                if mask_n<0:
                    mask_n+=1
            mask[mask_m,mask_n]=1
    return mask

# overlaying the mask, discarding the high-frequency components
def Compress(img,mask,N):
    img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N))
    for m in range(0,img_dct.shape[0],N):
        for n in range(0,img_dct.shape[1],N):
            block=img[m:m+N,n:n+N]
            # DCT
            coeff=cv2.dct(block)
            # IDCT, but only the parts of the image where the mask has a value of 1 are retained
            iblock=cv2.idct(coeff*mask)
            img_dct[m:m+N,n:n+N]=iblock
    return img_dct

# Images keeping only 1, 3, and 10 low-frequency coefficients
plt.figure(figsize=(16,4))
plt.gray()
plt.subplot(141)
plt.title('Original image')
plt.imshow(img)
plt.axis('off')

plt.subplot(142)
plt.title('Keep 1 coefficient')
plt.imshow(Compress(img,z_scan_mask(1,8),8))
plt.axis('off')

plt.subplot(143)
plt.title('Keep 3 coefficients')
plt.imshow(Compress(img,z_scan_mask(3,8),8))
plt.axis('off')

plt.subplot(144)
plt.title('Keep 10 coefficients')
plt.imshow(Compress(img,z_scan_mask(10,8),8))
plt.axis('off')
plt.show()
Enter fullscreen mode Exit fullscreen mode

Image description

For the images above, the more coefficients you retain, the better the image quality. In the DCT matrix, the top-left corner's DC component contains most of the information. If you only keep the DC component from the top-left corner, the entire block will be represented by the DC component alone, resulting in the entire block having a single value. By retaining more coefficient information, more cosine components are overlaid, resulting in more details in the image.

Quantization

The concept of quantization is similar to the second step. The quantization matrix is user-defined, where smaller quantization coefficients retain more information from the image, while larger quantization coefficients retain less information. For example, consider a 2x2 DCT matrix:

C=[21122135] C=\begin{bmatrix} 211 & 22 \\ 13 & 5 \end{bmatrix}



If the quantization matrix is:

Q=[6666] Q=\begin{bmatrix} 6 & 6 \\ 6 & 6 \end{bmatrix}

If we divide CC by QQ , round the resulting values, and then multiply them back by QQ , we can obtain the quantized matrix CqC_q .

Cq=[21024120] C_q=\begin{bmatrix} 210 & 24 \\ 12 & 0 \end{bmatrix}

Indeed, we can observe that the value 5 in CC becomes 0 after quantization. If we reduce QQ , for example, to 5, the value 5 will not be discarded. Conversely, if we increase QQ , it is possible that higher values, such as 13, may also be discarded. In software applications like Photoshop and other image processing tools, adjusting the "quality" setting corresponds to modifying the quantization coefficients. The quantization matrix can have different values at each position.
# Define quantization matrix
q_mat=np.array([[16,11,10,16,24,40,51,61],
                [12,12,14,19,26,58,60,55],
                [14,13,16,24,40,57,69,56],
                [14,17,22,29,51,87,80,62],
                [18,22,37,56,68,109,103,77],
                [24,35,55,64,81,104,113,92],
                [49,64,78,87,103,121,120,101],
                [72,92,95,98,112,100,103,99]])

# Quantization
def Compress_2(img,N):
    img_dct=np.zeros((img.shape[0]//N*N,img.shape[1]//N*N))
    for m in range(0,img_dct.shape[0],N):
        for n in range(0,img_dct.shape[1],N):
            block=img[m:m+N,n:n+N]
            # DCT
            coeff=cv2.dct(block)
            # Quantization
            q_coeff=np.around(coeff*255/q_mat)
            # Inverse quantization
            iq_coeff=q_coeff*q_mat
            # IDCT
            iblock=cv2.idct(iq_coeff)
            img_dct[m:m+N,n:n+N]=iblock
    return img_dct

# Show Images
plt.figure(figsize=(8,4))
plt.gray()
plt.subplot(121)
plt.title('Original')
plt.imshow(img)
plt.axis('off')

plt.subplot(122)
plt.title('Quantified')
plt.imshow(Compress_2(img,8))
plt.axis('off')
plt.show
Enter fullscreen mode Exit fullscreen mode

Image description

Top comments (0)