DEV Community

Ezekiel nwuguru
Ezekiel nwuguru

Posted on

DAY 7: UTF-8-VALIDATION

Hey! It's day 7 of 10 days coding challenge with I4G. Today's task was to write a code that validates a utf-8 code.

Thought process:
Understanding of problem: The problem here is to validate a utf-8 character. a utf-8 character has the following features:

  • A valid UTF-8 character can be 1 - 4 bytes long.
  • For a 1-byte character, the first bit is a 0, followed by its unicode.
  • For an n-bytes character, the first n-bits are all ones, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.
  • The input given would be an array of integers containing the data. We have to return if the data in the array represents a valid UTF-8 encoding. The important thing to note here is that the array doesn't contain data for just a single character. As can be seen from the first example, the array can contain data for multiple characters all of which can be valid UTF-8 characters and hence the charset represented by the array is valid.

Solution: To achieve this task, I used bit manipulation. A right shift if performed on the to check it's results for either 1 byte, 2 byte, 3 bytes or 4 bytes.

Algorithm:

  • Declare two integer variables: count, i
  • Set count = 0;
  • Using for loop iterate through the array of integers with condition; i = 0, i < data.length
  • perform bitwise shift and compare to the features of utf-8 code
  • Return true if condition is met or false if otherwise

Checkout the code here: https://leetcode.com/problems/utf-8-validation/submissions/

Top comments (0)