The unicode encoding system

#encoding #python #beginners

Have you ever wondered what went on behind the scene when you type series of texts on your keyboard? or send an e-mail to a friend from Rwanda who speaks french? how is the mail converted from english to french, how does the computer find the equivalent for each character you type? does the computer understand english? so many questions comes to heart. Well what we can all agree for sure is that the computer understands only 2 characters "0" and "1" respectively, these can be referred to as bits.This implies that an alphabets has to be interpreted as numbers for computers to store texts.

So how were all these characters in-cooperated into the computer's system? well in the early days of computer's inception (1960s), the primary means of communication between people were the use of teletypes(typewriters, teleprinters, etc).
These teletypes used a 5-bit encoding system which could range up to 32 character sets(2 ^ 5 = 32), the problem with this system was that it didn't provide enough space to represent all the english letters(a-z, A-Z), punctuation signs, numbers and other quintessential characters needed for effective communication.

Introducing Ascii

Due to the limitations of the 32-bits encoding system, there was need for a much better and standardized means of communication, in october 1960 The American Standards Association (ASA), now the American National Standards Institute (ANSI),led by Robert William Bemer (February 8, 1920 – June 22, 2004 began work on ASCII which is an acronym for American Standard Code for Information Interchange.In 1963 ASA introduced the first version of Ascii, unlike the former it was a 7-bit encoding system that could hold up to 128 character sets(2 ^ 7 = 128), numbered 0-127.

So for the English language, which has 26 letters, ASCII had enough slots for both upper and lower letter cases, numbers (0 to 9), punctuation marks, and unprintable control codes for teleprinters.

It was a great improvement obviously and by march of 1968, then US President Lyndon B. Johnson, announced that henceforth all computer systems should adopt the Ascii system as the default standard for information interchange(see more here), but as with every technology Ascii had it's own bottlenecks, one of which was it's in-ability to represent non-english characters, So, for European languages that use accented alphabets like German ä, ë, or Polish ź, ł, ę, ASCII wasn’t a favorable option.

Unicode to the rescue

Once again there was need for a much more diverse encoding system, that breached the disparities in communication and enhanced universal inclusion, as all other attempt towards tackling the problem resulted in a more complicated problem, during this period, globalization and internalization had become a core aspect of marketing and distribution, therefore global inclusion was vital at this point.

So in 1988 Joe Becker a computer scientist and expert on multilingual computing introduced an encoding scheme known as Unicode(Uniform character enCoding) in which each character is assigned a unique number known as a Code Point(A code point is the value that a character is given in the Unicode standard). This was a real breakthrough as not only was this applicable to english language alone, but also for every language around the world. The objective of Unicode was/is to unify all the different encoding schemes so that the confusion between computers can be limited to the very minimum.
Currently the Unicode is of three variants namely:

UTF-8: which is made of one byte(or 8 bits) and is well known for it's wide adoption in email systems and the internet in general
UTF-16: as you guessed is made up of two bytes(or 16 bits)
UTF-32: this encoding scheme utilizes four bytes(or 32 bits) to represent textual characters.

Note: UTF means Unicode Transformation Unit.

And that brings us to the end of this article, of course there's much more to encoding as this is just a quick see-through of the broad field of encoding/multi-lingual processing.
If you enjoyed this article, kindly leave a comment on what you learned from this one. Peace Out :)

DEV Community

The unicode encoding system

Introducing Ascii

Unicode to the rescue

Top comments (0)

Read next

Oh My Zsh: A Simple Guide for Developers

Connect to multiple databases, make or generate SQL queries, analyze or visualize.

From Lama2 to LiveAPI: Building Super-Convenient API Documentation (Part II)

Welcome 2025 with 7 must-try New Year decorating ideas