Roman

Posted on Jun 30, 2023 • Updated on Jul 10, 2023

Optimization of u128 decoding from base62

#rust #math #base62

In the previous article, we looked at u128 to base62 encoding, now we will implement and optimize the inverse operation of decoding u128 from base62, this can be useful, for example, for more compact storage in memory or database.

In order to understand which optimizations can be applied, let's start with a simple, naive implementation:

const BASE62_N: usize = 62;

pub fn base62_to_u128_naive(base62_str: &str) -> Option<u128> {
    let base62_str: [u8; 22] = base62_str.as_bytes().try_into().ok()?;
    let mut n: u128 = 0;
    for digit in &base62_str {
        let number = match digit {
            d if (b'0'..=b'9').contains(d) => d - 48,
            d if (b'A'..=b'Z').contains(d) => d - 55,
            d if (b'a'..=b'z').contains(d) => d - 61,
            _ => return None,
        };
        n = n
            .checked_mul(BASE62_N as u128)
            .and_then(|n| n.checked_add(number as u128))?;
    }
    Some(n)
}

Let's analyze what is happening here, the function signature shows that we accept a string and get an optional u128 number as a result, since in case of incorrect data in the string we will not be able to decode it:

fn base62_to_u128_naive(base62_str: &str) -> Option<u128>

All identifiers must be 22 characters long, confirm this condition:

let base62_str: [u8; 22] = base62_str.as_bytes().try_into().ok()?;

Next, we calculate the value of the number in a loop from the digits of the base62 string:

for digit in &base62_str

We need to convert each digit to decimal, for example "B" would be "11" in decimal, we do this in match, checking each of the possible intervals of allowed characters:

let number = match digit

Now we calculate the number by multiplying by the base 62 and adding the next digit. A 22-character number in base 62 holds more numbers than a 128-bit number, so we need to check for overflow when converting:

n = n.checked_mul(BASE62_N as u128).and_then(|n| n.checked_add(number as u128))?;

We got the result, returning:

Some(n)

Let's look at the benchmark results, what the first version of the function is capable of (CPU i5-4690):

bench_base62_to_u128_naive 1000000: 0.121s 125.75Mib/s, 8240895.48/s

Not bad, the speed is already a little bit faster than the speed of even the optimized encoder from the first article. All due to the fact that there are no expensive division operations during decoding, but we can do better!

Let's start with optimization. We will start by getting rid of match. Instead of checking the intervals and subtracting the offset, we can create an array with all possible variants, the byte representation of the character from the base62 string will be the offset (index), and the value is a digit in decimal:

const BASE62_MAP: [u8; 123] = [
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 255, 255, 255,
    255, 255, 255, 255, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
    29, 30, 31, 32, 33, 34, 35, 255, 255, 255, 255, 255, 255, 36, 37, 38, 39, 40, 41, 42, 43, 44,
    45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
];

Now, for conversion, it is enough to take the value by index, for example, BASE62_MAP[b'A'] will be 10.

The next step is to deal with multiplication and addition in a loop. These overflow-checked operations (checked variants) require more instructions than normal unchecked operations. Since overflow is only possible at the last step, we can move its check outside the loop by converting only the first 21 characters in the loop, and the last one after the loop:

for digit in &base62_str[..21] {
        // ...
        n = n * BASE62_N as u128 + number as u128;
}

Another problem here is that the multiplication itself (as well as addition) of 128-bit numbers is more expensive than, for example, multiplying 64-bit numbers, which is easy to see by looking at the generated assembler for both of these operations. In the previous article, we replaced 128-bit division with 64-bit division, but here we will do the same for multiplication and addition. To do this, we will split the original base62 number into several blocks along the digit boundaries, convert each block separately, then collect the entire number from these parts:

3322222222221111111111 -> 33 2222222222 1111111111
                          n3         n2         n1
n = n3 * 62^20 + n2 * 62^10 + n1

So the final optimized version of the decoder, which we got, looks like this:

const BASE62_N: usize = 62;
const BASE62_POW_10: u128 = (BASE62_N as u128).pow(10);
const BASE62_POW_20: u128 = (BASE62_N as u128).pow(20);
const BASE62_MAP: [u8; 123] = [
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
    255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 255, 255, 255,
    255, 255, 255, 255, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
    29, 30, 31, 32, 33, 34, 35, 255, 255, 255, 255, 255, 255, 36, 37, 38, 39, 40, 41, 42, 43, 44,
    45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
];

pub fn base62_to_u128(base62_str: &str) -> Option<u128> {
    let base62_str: [u8; 22] = base62_str.as_bytes().try_into().ok()?;
    let mut n_section: u64 = 0;
    for digit in &base62_str[12..22] {
        let number = *BASE62_MAP.get(*digit as usize).filter(|it| **it <= 61)?;
        n_section = n_section * BASE62_N as u64 + number as u64;
    }
    let mut n = n_section as u128;
    n_section = 0;
    for digit in &base62_str[2..12] {
        let number = *BASE62_MAP.get(*digit as usize).filter(|it| **it <= 61)?;
        n_section = n_section * BASE62_N as u64 + number as u64;
    }
    n += n_section as u128 * BASE62_POW_10;
    n_section = 0;
    for digit in &base62_str[0..2] {
        let number = *BASE62_MAP.get(*digit as usize).filter(|it| **it <= 61)?;
        n_section = n_section * BASE62_N as u64 + number as u64;
    }
    n.checked_add((n_section as u128).checked_mul(BASE62_POW_20)?)
}

Some clarifications for the code. Here we calculate a number based on blocks of digits (less significant digits at the end of a string):

for digit in &base62_str[12..22]
// ...
for digit in &base62_str[2..12]
// ...
for digit in &base62_str[0..2]
// ..

This line converts a base 62 digit to a decimal number. At the same time, since the data may be incorrect, a check is added that digit is in the allowed range:

let number = *BASE62_MAP.get(*digit as usize).filter(|it| **it <= 61)?;

We make sure that the top 2 digits do not cause overflow:

n.checked_add((n_section as u128).checked_mul(BASE62_POW_20)?)

Let's check the benchmark, was it worth our efforts?

bench_base62_to_u128 10000000: 0.182s 836.24Mib/s, 54803747.40/s

Wow that's impressive! I even had to increase the number of iterations by a factor of 10 to get stable measurements. As a result, the improvement compared to the original version is 6.65 times.

My links

I'm making my perfect todo, note-taking app Heaplist, you can try it here heaplist.app
And I have a twitter @rsk

Top comments (1)

Roman • Jun 30 '23

that was fun to research on this topic

DEV Community

Optimization of u128 decoding from base62

My links

Top comments (1)

Read next

Testable Apps: Why You Should Consider The Composable Architecture

Remotely Access Your Home Assistant Instance Securely

AWS fundamental RoadMap 🎯

Why Hashtags Matter in Social Media?