DEV Community

Idil Saglam for Makepad

Posted on • Updated on

Huffman Coding Compression Algorithm

Huffman Coding is an algorithm for doing data compression and it forms the basic idea behind file compression. In this article, we are going to talk about the construction of Huffman Tree and how to decode it.

How does Huffman Coding work?
Suppose the string below is a message to be sent over a network.

Alt Text

Each character occupies 8 bits. There are a total of 20 characters in the string. This means, our final message is 8×20 = 160 bits. Using the Huffman Coding technique we can compress the string to a smaller size.

Steps to build Huffman Tree
1- We have to create a table showing the frequency of each character in the sentence including spaces, punctuations, and any other characters.
2- Order the list by frequency. The only order that is important is that they are ordered by frequency, the highest value at the top.2

3- It’s time to draw the tree!
Start from the least frequent items and put these into circles, starting at the bottom right of your paper.Alt Text

Take the two items furthest to the right, pair them together, joining the branches by summing the frequencies together. Continue to do so until you have matched the pairs you can, leaving any spares at the moment.Alt Text

Start at the root node (In this example root node is ‘20’), put 0 next to each lefthand branch, and 1 against each right-hand branch:
Alt Text

Start from the root node, find the path to each character (called leaf node), identify the 1 and 0 branches you have to use to get to each one. Place these in the Code column of your table.Alt Text

To reach the compressed message, we have to sum characters (8×5 bits) and codes (5×3 bits).
40+15 = 55

At the beginning of the article, we discussed that our final message was 160 bits. The compressed message is 55 bits. That means we saved 60 bits!

Initial String
BCCABBDDAECCBBAEDDCC
Compressed message 101111001101001010010001111101000100001011111

Decoding Huffman-encoded Data
The decoding procedure is deceptively simple. To avoid ambiguity, Huffman encoding is a prefix-free encoding technique. No codeword appears as a prefix of any other codeword.

To decode the encoded string, follow the zeros and ones to a leaf and return the character there.
Alt Text

Top comments (0)