Parallel Matrix Multiplication in Rust

#rust #machinelearning

Much of linear algebra revolves around matrix multiplications. Since this is an O(n^3) operation, other techniques are needed to improve its performance. Today, I will show how I parallelized matrix multiplication, improving speed by 4x on my 12-thread machine.

In my examples, I am using the nalgebra crate to handle matrix storage (which stores dynamically sized matrices as a Vec in column-major order). This isn't too important, the only thing that really matters is the ability to index the matrix by row and column index.

Single-Threaded Implementation

In order to multiply matrices, we need to iterate over the columns of the rhs, then for each column, iterate over the rows of the lhs. We then zip the lhs row with the rhs column, taking the dot product of these two arrays.

let l_shape = lhs.shape();
let r_shape = rhs.shape();

// check for shape compatibility here...

// the multiplication
let result: Vec<f64> = (0..r_shape.1).flat_map(move |rj| {
    (0..l_shape.0).flat_map(move |li| {
        (0..r_shape.0)
            .zip(0..l_shape.1)
            .map(move |(ri, lj)| {
                lhs.index((li, lj)) * rhs.index((ri, rj))
            })
            .sum::<f64>()
    })
})
.collect();

// result is a vec in column-major order

Parallelizing

I will now use rayon to parallelize this operation. It's way simpler than you may think!

let result: Vec<f64> = (0..r_shape.1).into_par_iter().flat_map(move |rj| {
    (0..l_shape.0).into_par_iter().flat_map(move |li| {
        (0..r_shape.0)
            .zip(0..l_shape.1)
            .map(move |(ri, lj)| {
                lhs.index((li, lj)) * rhs.index((ri, rj))
            })
            .sum::<f64>()
    })
})
.collect();

That's it! Just adding into_par_iter after the ranges is all that is needed. I benchmarked this on an Intel i5-10600K (12) @ 4.800GHz, multiplying a 1000x1000 matrix with a 1000x1 vector, and the average execution time went from 8 ms down to 2 ms. That is a 75% improvement in speed.

DEV Community

Parallel Matrix Multiplication in Rust

Single-Threaded Implementation

Parallelizing

Top comments (0)

Read next

Why Rust Is Worth Using on the Backend

From prompts to programs: Language models' unbounded computational power

WebRL: Self-Evolving LLM Agents Learn Web Navigation via Adaptive Curriculum Training

Part 1: Building Your Own AI - Introduction to AI and Machine Learning