Implementing R Functions in Rust with [extendr]

I’ve recently (since the beginning of 2021) been trying my hand learning and using Rust, and so far it has been a really good experience. Rust has a lot to recommend it, including top-notch tooling, inherent memory safety, and blazing speed. That last part comes from the fact that Rust is a compiled, systems programming language and was the inspiration for picking up Rust in the first place.

You see, I absolutely love R. In my opinion (admittedly biased), there is no better set of tools available in any programming language for working directly with data sets. And, if some other language does have a tool you need, R even has support for leveraging those tools as well.¹. The R ecosystem contains some truly excellent packages (especially rmarkdown, knitr, and shiny) for making the results of your analysis available and approachable to others. Where R excels is in ad hoc data manipulation, analysis, and reporting. Where R struggles (like other garbage-collected, interpreted languages) is with regards to performance, especially in high-throughput, large-scale settings. A major offset for these performance penalties comes from leveraging functionality written in a lower-level language like C, C++², Java³, FORTRAN, etc. Recently, Rust has begun to make headway in this space with the advent of the extendr project.

Rust is a very attractive option, especially for developers and analysts who may not be fluent in a systems programming language already. Rust and the Rust compiler are designed such that memory-safety is enforced by default. From a practical perspective, this means you are much less likely to get random, hard-to-diagnose crashes in Rust code (as compared to clumsy C/C++) and are therefore much more likely to be able to successfully leverage Rust, even if you are not an experienced Rust developer. Personally, I also find the tooling around Rust to be much more approachable that Java (Interminably nested directories and complicated build files, anyone? No, thank you.) With extendr, we now have a straightforward path for writing Rust code and accessing it in our R packages. In order to explore this strategy, I’ve written an R package I’m calling rustbind, which is intended to be something of a showcase and sample implementation for calling Rust functions from R code. Let’s walk through just how that works.

Adventures in Parallel Processing

To demonstrate the sorts of benefits (particularly speed gains) that can be found by falling back to Rust code from R, we will tackle a problem that is difficult to handle natively in R: parallelism. To do so, I’ve set up a somewhat artificial benchmark that, while overly simplistic, can be representative of real problems that benefit for parallel processing. That problem is calculating square roots using the Babylonian Method. The naive approach involves (for a given number \(N\)):

Guessing an initial value \(x_0\).
Improving the guess by applying the formula \(x_1 = \frac{(x_0 + N/x_0)}{2}\). This new \(x_1\) is closer to \(\sqrt{N}\).
Iterate on step 2 until the difference between \(x_{n+1}\) and \(x_n\) is small enough to satisfy you that your estimate is close enough.

Calculating square roots is deceptively computationally expensive (not as expensive as cryptographic hashing, of course), especially if your margin of error is small. Now, for a single value, the time taken to calculate a square root using this algorithm is essentially negligible. However, if you need to perform this operation on one million values, then it can become time consuming. In general, in R, we rely on vectorization to speed up these kinds of calculations, but there are cases in which this strategy is difficult or impossible to implement (processing a data set where the calculation for one record depends on a value in a different record, say). Implementing this algorithm in R and Rust, respectively, looks like this:

Naive Square Root Algorithm (R)

# R Code

naive_sqrt <- function(n) {
  if (length(n) > 1) { stop("Vectors longer than one not supported") }
  current_guess <- n
  adjustment <- 1
  error <- 0.000001
  while (current_guess - adjustment > error) {
    current_guess <- (current_guess + adjustment) / 2
    adjustment <- n / current_guess
  }
  current_guess
}

Since this algorithm doesn’t support vectorized operations, let’s add a friendly error message.

Naive Square Root Algorithm (Rust)

// Rust Code

fn naive_sqrt(n: &f64) -> f64 {
    let mut current_guess = *n;
    let mut adjustment = 1.0;
    let error = 0.000001;
    while current_guess - adjustment > error {
        current_guess = (current_guess + adjustment) / 2.0;
        adjustment = n / current_guess;
    }
    current_guess
}

So, you can see, these two algorithms are functionally (and almost lexically) identical.

Parallelizing in R

For comparison, we will use two versions of the R function that can operate on a vector of doubles and return their square roots. This first one is a simple wrapper around sapply:

# R Code

sapply_naive_sqrt <- function(n) {
  sapply(n, naive_sqrt)
}

The parallel version is an almost equally simple wrapper around future_sapplyfrom the future.apply package:

# R Code

future_apply_naive_sqrt <- function(n) {
  future.apply::future_sapply(n, naive_sqrt)
}

Astute readers (and experienced users of future.apply) will note that we’re not tweaking the settings here, which could probably improve performance. If you’re an R programmer, this is pretty straightforward code. Don’t worry, it gets worse…

Parallelizing in Rust

Since we’re ostensibly interested in Rust for multithreading (for this example, anyway), let’s skip over implementing the sequential operation in Rust and go straight for threads:

// Rust Code

fn multithreaded_naive_sqrt(floats: &[f64]) -> Vec<f64> {
    // Calculate the number of threads we want to use, giving us the size of the
    // chunks to break our input 'vector' into.
    let thread_count = match floats.len() {
        0 => return Vec::new(),
        1..=1_000 => 1,
        1_001..=10_000_000 => floats.len() / 1_000,
        _ => 10_000,
    };
    let mut threads = Vec::with_capacity(thread_count);
    let chunk_size = (floats.len() / thread_count) + 1;

    // For each chunk of numbers (Rust calls them floats, R calls them doubles)...
    for chunk in floats.chunks(chunk_size) {
        // Pass that chunk to a thread, iterate over the items in the chunk,
        // calculate the square root of each, return the result.
        let chunk = chunk.to_vec();
        let thread = std::thread::spawn(move || {
            let sqrts: Vec<_> = chunk.iter().map(naive_sqrt).collect();
            sqrts
        });
        threads.push(thread); // Keep our thread handles in a list
    }

    // Gather up all the results from all the threads, take the results from
    // each thread, put them all in the output variable
    let mut output: Vec<f64> = Vec::with_capacity(floats.len());
    for thread in threads {
        match thread.join() {
            Ok(squared_floats) => output.extend(&squared_floats),
            Err(e) => std::panic::resume_unwind(e),
        }
    }

    output
}

That’s quite a bit more code, but we are implementing parallel processing using only the standard library, so we can feel pretty good about it. Other than the thread spawning, if you’re passingly familiar with Rust, you shouldn’t have any trouble parsing this function. Now, as with most hard problems, it’s probably better to let someone else solve it for you. There is a Rust crate, rayon, that provides a number of functions that allow iterating over a vector in Rust in parallel fashion, with much less work involved.

// Rust Code

fn rayon_naive_sqrt(floats: &[f64]) -> Vec<f64> {
    floats.par_iter().map(naive_sqrt).collect()
}

That par_iter() function provides a parallelized iterator, and the rest of it should look pretty familiar. Clearly, this is just as easy to write as the relevant R operation.

Speed Test

Equipped with our algorithms and functions, we can compare performance. I am skipping over the implementation details of actually getting to those Rust functions from the R interface, that’s pretty well covered in the README of my rustbind GitHub repository (extendr makes it relatively easy). Suffice it to say, I am calling all the relevant functions from the R interface and behchmarking using the rbenchmarkpackage. The following code creates a sample data set of one million random numbers, passes that dataset to each of the four named functions 100 times, and calculates the average run time for each.

# R Code

test_data <- runif(1000000) * 1000000
rbenchmark::benchmark(
    r_sapply = rustbind::sapply_naive_sqrt(test_data), 
    r_future_sapply = rustbind::future_apply_naive_sqrt(test_data),
    rust_multithreaded = rustbind::multithreaded_naive_sqrt(test_data),
    rust_rayon = rustbind::rayon_naive_sqrt(test_data)
)

So, what’s the difference? Pretty significant…

                    test replications  elapsed  relative
2 r_future_sapply    100      402.377  366.464   400.041
1 r_sapply           100      326.108  297.002   324.171
3 rust_multithreaded 100        2.975    2.709    10.329
4 rust_rayon         100        1.098    1.000    10.223

There are other metrics we could look at here, but let’s focus on that ‘relative’ column. Our baseline is ‘rust_rayon’ with a value of ‘1.000’. ‘rust_multithreaded’ was ~2.7 times slower, which makes sense, there were definitely some improvements that could be made to my hand-coded attempt at multithreading. Comparing to the R functions, though, is where it gets really interesting. The sequential R operation ‘r_sapply’ took almost 300 times as long to run as ‘rust_rayon’, and the parallel operation in R using future.apply took ~ 366 times as long to run. This makes the parallel operation in R slower than the sequential operation. A bit of Googling around showed me that this was not an uncommon finding, and while there are certainly settings that can be tweaked to improve the future.apply performance, doing so seems fiddly and likely to incur unanticipated costs. Speaking of incurring costs, consider how much more responsive a plumbr API backed by expensive operations in Rust could be… That translates to the potential for using smaller instances in cloud environments, which can mean substantial cost savings at scale.

Wrap-Up

So, as you can see, Rust runs significantly faster that R. This shouldn’t be at all surprising. What might be surprising is (1) how much more approachable Rust is as a systems language for implementing R functionality than C/C++/Java and (2) how much help extendr gives you in calling Rust functions from R packages. If you’re looking for ways to speed up your R code, and you’ve been considering C/C++, I encourage you to give Rust a second look. It could be well worth your time.

Top comments (1)

Lyder Aleksander Mathisen • Jan 12 '22

Cool!!!