loading...

Displaying all printable* utf-8 characters using Rust

elasticrash profile image Stefanos Kouroupis ・2 min read

Since I got my first year badge I decided to celebrate by listing my achievements and writing one of the most useless applications I've ever written.

achievements

  • > 20.000 views...pretty neat! nearly 60 per day
  • > 2500 followers ...again nice! that's nearly 7 per day
  • 2.5 years on the same job.

application

This amazing application as the title states prints all utf8 characters that can be printed. The star on the title is that I limited the output to the first 3 bytes.

ENJOY

Alt Text

Our main function has 3 loops

  • one for the single byte chars 0000 - 007F
  • one for the two byte chars 00C0 - 00DF | 0080 - 00BF
  • one for the three byte chars 00E0 - 00EF | 0080 - 00BF | 0080 - 00BF
use std::num::ParseIntError;
use std::str;

fn main() {
    let mut char_index = 0;
    let one_byte = vec![0, 127];

    for i in one_byte[0]..one_byte[1] {
        let mut first = format!("{:X}", i);
        first = make_even(first);
        char_index = output(first, char_index);
    }

    let two_bytes = vec![192, 223, 64, 191];

    for i in two_bytes[0]..two_bytes[1] {
        for j in two_bytes[2]..two_bytes[3] {
            let mut first = format!("{:X}", i);
            let mut second = format!("{:X}", j);

            first = make_even(first);
            second = make_even(second);

            char_index = output(first.to_string() + &second.to_string(), char_index);
        }
    }

    let three_bytes = vec![224, 239, 64, 191, 64, 191];

    for i in three_bytes[0]..three_bytes[1] {
        for j in three_bytes[2]..three_bytes[3] {
            for k in three_bytes[4]..three_bytes[5] {
                let mut first = format!("{:X}", i);
                let mut second = format!("{:X}", j);
                let mut third = format!("{:X}", k);

                first = make_even(first);
                second = make_even(second);
                third = make_even(third);

                char_index = output(
                    first.to_string() + &second.to_string() + &third.to_string(),
                    char_index,
                );
            }
        }
    }
}

Hex string needs an even amount of characters

pub fn make_even(mut s: String) -> String {
    if s.len() % 2 == 1 {
        s = "0".to_string() + &s.to_string();
    }
    return s;
}

I got this function from here. What it basically does is, it converts a hex string to a u8 array.

pub fn decode_hex(s: &str) -> Result<Vec<u8>, ParseIntError> {
    (0..s.len())
        .step_by(2)
        .map(|i| u8::from_str_radix(&s[i..i + 2], 16))
        .collect()
}

Nested matches for the win. Checks :

  • if the hex string is valid
  • if the character is valid in utf-8
  • if the character has a printable representation (by looking the printed output length)
pub fn output(hex: String, mut i: i32) -> i32 {
    match &decode_hex(&hex) {
        Ok(dh) => match str::from_utf8(dh) {
            Ok(v) => {
                if format!("{:?}", v).len() < 7 {
                    if i % 10 == 0 {
                        println!("{:?} {:?} \t", hex, v);
                    } else {
                        print!("{:?} {:?} \t", hex, v);
                    }
                    i += 1;
                }
            }
            _ => {}
        },
        _ => {}
    }

    return i;
}

Discussion

pic
Editor guide