DEV Community

Cover image for How to Do Away with Floating Points
Sean Williams
Sean Williams

Posted on

How to Do Away with Floating Points

Don't get me wrong, floating points are good for a lot of things, but their inaccuracy is well known. The basic issue is that floating points are based on a m*2^p approach. If p is negative you get fractions, since for example 2^-1 = 0.5, 2^-2 = 0.25, and so on.

The reason you can get funky behavior with floats is because the denominator must always be a power of two, so denominators that aren't powers of two are approximated. This is why you can compute something that should be 0.3, but end up with 0.29999999999999.

I took some accounting classes after I pivoted into business, and got madder and madder about having to round my own fractions when using a REPL as a calculator. Eventually I sat down and wrote a precise fractional calculator. Which is harder than it seems.

As a brief reminder, a rational number is a pair of integers p ÷ q, for example, ½. Integers are also rationals, with q = 1. They're called rationals because they express a ratio, if you were wondering.

Floating-point is technically a rational number format, but it's often used to approximate real numbers. Real numbers include those that can't be represented as a fraction of integers, like π and sqrt(2).

To finalize the intuition about why floating-point "drift" occurs, consider that decimal is also a rational number system. However, decimal only allows a finite representation of rational numbers for which the prime factorization of the denominator only contains 2 and 5.

Explicit Rationals

Another approach to rational numbers is to store them as a pair of integers. I'll quickly lay out the algorithms for doing this:

Prime Factorizing: x

factors = []
limit = floor(sqrt(x))
for d in 2 to limit:
  while x > 0 and x % d = 0:
    factors.append(d)
    x = x / d
  if x == 0:
    break
Enter fullscreen mode Exit fullscreen mode

Putting into Lowest Terms: p / q

Compute the prime factors of p and q. For each factor f present in both, divide both p and q by f. Alternatively, divide p and q by the product of their common factors.

Addition: p1/q1 + p2/q2

Compute the prime factors of q1 and q2, call them f1 and f2. For each factor f in f1 that isn't also in f2, multiply p2 and q2 by f. For each factor f in f2 that isn't also in f1, multiply p1 and q1 by f.

This has the effect of setting q1 = q2. The result is (p1 + p2) / q1, but for housekeeping purposes you should put this into lowest terms.

Subtraction: p1/q1 - p2/q2

Multiply p2 by -1, then apply the addition algorithm.

Multiplication: (p1/q1) * (p2/q2)

This is much easier, it's just (p1 * p2) / (q1 * q2), put into lowest terms.

Division: (p1/q1) / (p2/q2)

This is equal to (p1/q1) * (q2/p2).

Find a library for the above

You can probably find a library that implements all of this, but it's worth spelling it out so you can see what working with rationals is like. For example, Rust has a num_rational crate.

Parsing a rational

If you have a decimal number, say 0.25, this is equivalent to 25/100, which reduces to 1/4. (factors of 25: 5, 5; factors of 100: 2, 2, 5, 5; common factors: 5, 5 = 25; 25 / 25 = 1; 100 / 25 = 4.) So for example, you can get the numerator by removing the dot and parsing it as an integer, and if there are d digits after the dot, the denominator is 10^d.

Displaying a rational

This is the hardest part by far, and the real reason behind this post. The trivial way of displaying a rational is as p.to_str_radix(10) + "/" + q.to_str_radix(10). If q is 1, the number is an integer so really you should only display p.to_str_radix(10).

What about decimal expansion? In theory there are two cases: finite and infinite. Infinite expansions are periodic, for example, 1/3 = 0.(3), and 1/6 = 0.1(6). In practice, there's a third case: periodic expansions can be very, very long, and displaying thousands of digits for a correct expansion isn't very helpful.

Figuring out whether you're in a finite or infinite case is easy: a rational with a finite expansion for decimal output is one where the prime factors of q contains only 2 and 5. In that case, we can work out the expansion using long division (here I'm calling the terms of the rational n and d, for numerator and denominator):

fn unsafe_bigint(x: i32) -> BigInt {
    x.to_bigint().unwrap()
}

fn long_divide_finite(n: BigInt, d: &BigInt) -> String {
    let zero = Zero::zero();
    let ten = unsafe_bigint(10);
    let mut n = n;
    let mut result = String::new();
    loop {
        if n == zero {
            return result;
        }
        let (q, r) = n.div_rem(&d);
        result.push(digit_to_char(&q));
        n = r * &ten;
    }
}
Enter fullscreen mode Exit fullscreen mode

Though it may be strange to see it written out like this, you learned this algorithm in like third or fourth grade.

Now, infinite expansions are trickier. An infinite decimal expansion consists of two parts: a fixed prefix, and an infinitely repeated periodic section. For example, in 1/3, the prefix is empty, and the periodic section is just 3. In 1/7, the prefix is empty, and the periodic section is 142875. In 1/6, the prefix is 1, and the periodic section is 6.

It turns out, there's a formula of sorts for calculating the length of the prefix, which we'll call s, and the length of the periodic section, which we'll call t. The formula is: 10^s is congruent to 10^(s + t) (mod d). Another way of writing this is, we need to find the values of s and t that satisfy: (10^s - 10^(s + t)) mod d = 0.

Because of the third case, we need to figure out the largest s + t we're willing to accept. You don't have to do this, and you can just write a loop of the form,

for s in 0..infinity:
  for t in 1..s:
    if (10^s - 10^(s + t)) mod d == 0:
      return (s, t)
Enter fullscreen mode Exit fullscreen mode

It can just take a long time, and give you an unsatisfying result. My way looks like this:

static MAX_EXPANSION: usize = 50;

fn prefix_and_period(d: &BigInt) -> Option<(usize, usize)> {
    let zero = Zero::zero();
    let ten = unsafe_bigint(10);
    let mut s_pow: BigInt = One::one();
    for s in 0..MAX_EXPANSION {
        let mut t_pow: BigInt = unsafe_bigint(10);
        for t in 1..(MAX_EXPANSION + 1 - s) {
            if (&s_pow - &s_pow * &t_pow).mod_floor(&d) == zero {
                return Some((s, t));
            }
            t_pow = t_pow * &ten;
        }
        s_pow = s_pow * &ten;
    }
    None
}
Enter fullscreen mode Exit fullscreen mode

If this returns Some, then we have a feasible infinite expansion, and we again do long division: after reaching the decimal place, expand out s times, then insert a '(', then expand out t times, then insert a ')', and we're done.

fn long_divide_infinite(n: BigInt, d: &BigInt, s: usize, t: usize) -> String {
    let ten = unsafe_bigint(10);
    let mut n = n;
    let mut s = s;
    let mut t = t;
    let mut result = String::new();
    if s == 0 {
        result.push('(');
    }
    loop {
        if t == 0 {
            result.push(')');
            return result;
        } else {
            let (q, r) = n.div_rem(&d);
            result.push(digit_to_char(&q));
            n = r * &ten;
            if s == 0 {
                t = t - 1;
            } else if s == 1 {
                result.push('(');
                s = 0;
            } else {
                s = s - 1;
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

For a very long expansion, we just expand it out MAX_EXPANSION times and add a "..." to the end, so we know this was a truncated expansion:

fn long_divide_infinite_trunc(n: BigInt, d: &BigInt) -> String {
    let ten = unsafe_bigint(10);
    let mut n = n;
    let mut result = String::new();
    for _ in 0..MAX_EXPANSION {
        let (q, r) = n.div_rem(&d);
        result.push(digit_to_char(&q));
        n = r * &ten;
    }
    result + "..."
}
Enter fullscreen mode Exit fullscreen mode

The functions above only expanded the fractional portion of a number—the part that comes after the decimal point. The case breakdown looks like this:

fn show_rational(x: &BigRational) -> String {
    let one = One::one();
    let ten = unsafe_bigint(10);
    let a = x.abs();
    let n = a.numer();
    let d = a.denom();
    let negative = x < &a;
    let mut result = String::new();
    if negative {
        result.push('-');
    }
    result = result + &n.to_str_radix(10);
    if d == &one {
        return result;
    }
    let (ipart, fpart) = n.div_rem(d);
    result.push('/');
    result = result + &d.to_str_radix(10);
    if is_finite(&d) {
        result = result + " = ";
        if negative {
            result.push('-');
        }
        result = result + &ipart.to_str_radix(10);
        result.push('.');
        return result + &long_divide_finite(&fpart * &ten, &d);
    } else {
        match prefix_and_period(&d) {
            Some((s, t)) => {
                result = result + " = ";
                if negative {
                    result.push('-');
                }
                result = result + &ipart.to_str_radix(10);
                result.push('.');
                return result + &long_divide_infinite(&fpart * &ten, &d, s, t);
            }
            None => {
                result = result + " \u{2248} ";
                if negative {
                    result.push('-');
                }
                result = result + &ipart.to_str_radix(10);
                result.push('.');
                return result + &long_divide_infinite_trunc(&fpart * &ten, &d);
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The Unicode \u2248 is ≈, again to make sure we know this is only approximate. But this gives us the results we'd hope for...

5
5/2 = 2.5
5/3 = 1.(6)
5/6 = 0.8(3)
5/7 = 0.(714285)
5/1003 ≈ 0.00498504486540378863409770687936191425722831505483...
Enter fullscreen mode Exit fullscreen mode

Now just make a simple REPL, and you can do your accounting homework in peace, knowing your numbers will always be accurate.

Top comments (0)