Skip to content

DEV Community

Oluwanifemi Latunde

Posted on Sep 27, 2022

UTF-8 Validation

It's day 7 of the #I4G10DaysOfCodeChallenge. The objective of today's task was to determine whether the set of integers constitutes a valid UTF8 string or not.

You can find more details about the challenge here

A character in UTF-8 can be from 1 to 4 bytes long, subjected to the following rules:

For a 1-byte character, the first bit is a 0, followed by its Unicode code.
For n-bytes character, the first n-bits are all ones, the n+1 bit is 0, followed by n-1 bytes with the most significant 2 bits being 10.

     Number of Bytes   |        UTF-8 Octet Sequence
                       |              (binary)
   --------------------+-----------------------------------------
            1          |   0xxxxxxx
            2          |   110xxxxx 10xxxxxx
            3          |   1110xxxx 10xxxxxx 10xxxxxx
            4          |   11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Syntax:

Start with count = 0.
for “c” ranging from 0 to the size of the data array.
If the count is 0, then:
If x/32 = 110, then set count as 1. (x/32 is same as doing x >> 5 )
Else if x/16 = 1110, then count = 2 (x/16 is same as doing x >> 4 )
Else If x/8 = 11110, then count = 3. (x/8 is same as doing x >> 3 )
Else if x/128 is 0, then return false. (x/128 is same as doing x >> 7 )
Else If x/64 is not 10, then return false and decrease the count by 1.
When the count is 0, return true.

Result:
Runtime: 234 ms, faster than 56.39% of Python3 online submissions for UTF-8 Validation.

Memory Usage: 14.1 MB, less than 97.22% of Python3 online submissions for UTF-8 Validation.

Top comments (0)

Subscribe

Read next

Taking a Break: A Guide to Canceling Your Urban Air Membership

CRMBC - Apr 23

Bas' Take on Tech: Edge Computing, Digital Markets Act, Job Data

Bas Steins - Apr 23

Proxmox WiFi Connection Story

RAHUL DHOLE - Apr 24

Workshop: Unveiling the Power of Web Performance Metrics for Testers [Spartans Summit 2024]

LambdaTest Team - Apr 23

Oluwanifemi Latunde

I am a Web Developer with about two years experience mainly in frontend development but transitioning into backend development.

Location

Lagos, Nigeria
Education

University of Ibadan
Joined

Jun 15, 2022

Merge k Sorted Lists

Combine Two Tables