DEV Community

Richie Moluno
Richie Moluno

Posted on

Introduction to Charsets

CHARSETS
A charset is a set of different cahracters that are being used or supported by a software and hardware, it's the key component behind displaying and editing numbers, text and symbols on a computer. These character sets are created through a process known as encoding.

A computer or software has different purposes or tasks to carry out, the characters needed to perform a specific purpose are then grouped into a character set. When text is inputed through a keyboard or some other means, the character encoding map then maps the characters you chose to the specific byte memory.

Some example of Charsets are

  • ASCII
  • UNICODE
  • UTF-8
  • UTF-16
  • UTF-32

ASCII and UNICODE
ASCII is short for American Standard Code for Information Interchange. Unicode and ASCII are the two most used character encoding standards in the Technology sector, they are basically standards on how to represent set of characters in written form, these characters include, symbols, digits, lowercase letters and uppercase letters.

Most of these characters are represented in binary form since it is easier for computers to store numbers than letters, these characters are then written, stored and transmitted in digital media.

Differences between ASCII and UNICODE
One of the major differences between ASCII and UNICODE is the way they encode characters, ASCII only encodes a set of characters which include, letters, numbers and symbols but in the case of UNICODE, it encodes a much larger range of characters.

Originally ASCII used seven bits to encode each character which was inadequate, to sort out this issue Extended ASCII was introduced, it then increased from 7bits to 8bits.
For UNICODE, it uses a variable bit encoding program, here you can choose between 8, 16 and 32 bit encodings.

One other significant difference is the number of characters being accomodated by each of the encoding standards. UNICODE accomodates more characters than ASCII this is why most written languages are supported by UNICODE, Chinese is a good example of these languages. It also supports right-to-left scripts like Arabic.

Unicode is the IT standard for encoding texts for both Computers and telecommunication devices, but ASCII only encodes characters for electronic communication only.

Larger space is occupied by UNICODE because it is a superset of ASCII, and ASCII uses less space.

Top comments (0)