DEV Community

Caleb Winston

Posted on Jun 1, 2019

A language for programming GPUs

#showdev #discuss #rust #machinelearning

Most languages are for programming CPUs (that stands for Central Processing Unit). If you write a program in Python or Rust, for example, and run it - it gets run on your computer's CPU(s). But the CPU isn't the only part of your computer that can do computing. Your computer might also come with a GPU. GPU stands for Graphics Processing Unit and while GPUs are often used for processing graphics they are in general best at processing lots of numbers. If you have a list of a million numbers and want to scale them by a factor of 2. Your program will likely run faster on a GPU than a CPU.

But if languages like Python or Rust don't usually run on GPUs, what languages do? There are several. OpenCL, CUDA, and Halide are all languages for writing programs that can be run on GPUs. But all of these are designed first and foremost for C or C++. Also, there syntax can be a bit confusing...

static const char *kernelSource =
"#pragma OPENCL EXTENSION cl_khr_fp64 : enable  \n" \
"__kernel void mult(__global double *v) {       \n" \
"    int id, v_re, v_im;       \n" \
"    id   = get_global_id(0);  \n" \
"    v_re = 2*id;              \n" \
"    v_im = v_re + 1;          \n" \
"                              \n" \
"    v[v_re] = 2*v[v_re];      \n" \
"    v[v_im] = 4*v[v_im];      \n" \
"}                             \n" \
"\n" ;

There has to be a better way of programming GPUs. Well, there is a little something I've been excitedly working at for the past few months. It's a tiny programming language called Emu...

emu! {
    // multiplies 2 matrices
    // n is the dimension of the matrices
    // a and b are the matrices to be multiplied, c is the result
    multiply_matrices(n i32, global_a [f32], global_b [f32], global_c [f32]) {
        // indices of cells to multiply
        let i: i32 = get_global_id(0);
        let j: i32 = get_global_id(1);

        // execute step of multiplication
        for k in 0..n {
            global_c[i * n + j] += global_a[i * n + k] * global_b[k * n + j];
        }
    }
}

There are a few things you might notice right off the bat - (1) it's designed for use in Rust through a procedural macro instead of in C or C++, (2) it's syntax is Rust-like with Go-like parameter declarations. But wait - there's more!

You can annotate numbers with units. With the following syntax, the prefix of the unit specifies how much different numbers should be scaled and everything gets automatically normalized to be in terms of basic SI units. Here's an example...

emu! {
    move_particles(global_particles_x [f32], global_particles_y [f32], global_particles_z [f32]) {
        global_particles_x[get_global_id(0)] += 1 as cm;
        global_particles_y[get_global_id(0)] += 1 as cm;
        global_particles_z[get_global_id(0)] += 8.92 as nm;
    }
}

You can also use certain special numbers that are built in. These special numbers include a number of my favorite mathematical constants as well as physical constants. You can even use PAU!

emu! {
    add_particles(num_particles u32, num_moles u32) u32 {
        return num_particles + num_moles * L;
    }
}

I really hope this small programming language will help you program GPUs more easily from Rust. Please check out the repository - it's at https://github.com/calebwin/emu (there's even a little book in there!) and let me know if you have any comments, ideas, or constructive crticism. 🙌

calebwin / emu

The write-once-run-anywhere GPGPU library for Rust

The old version of Emu (which used macros) is here.

Overview

Emu is a GPGPU library for Rust with a focus on portability, modularity, and performance.

It's a CUDA-esque compute-specific abstraction over WebGPU providing specific functionality to make WebGPU feel more like CUDA. Here's a quick run-down of highlight features...

Emu can run anywhere - Emu uses WebGPU to support DirectX, Metal, Vulkan (and also OpenGL and browser eventually) as compile targets. This allows Emu to run on pretty much any user interface including desktop, mobile, and browser. By moving heavy computations to the user's device, you can reduce system latency and improve privacy.
Emu makes compute easier - Emu makes WebGPU feel like CUDA. It does this by providing...
- DeviceBox<T> as a wrapper for data that lives on the GPU (thereby ensuring type-safe data movement)
- DevicePool as a no-config auto-managed pool of devices (similar to CUDA)
- trait Cache…

View on GitHub

You can try it out right now by adding [dependencies] em = "0.1.2" your Cargo.toml or you can star repository for later.

Top comments (4)

Johannes Vollmer • Jun 2 '19

Great project! Will calling this emu function feel like calling a Rust function? That would be really cool, maybe you can add that to your article.

Just one question: Why not use the rust parameter syntax, which would also fit the let: X syntax?

Caleb Winston • Jun 4 '19

Yes! Calling an Emu function can feel like calling a Rust function. I explain how this is possible in the "Execution" chapter of the "book". I just implemented it yesterday night.

Well, I do feel like it's a bit neater this way. Also, I'm not particularly keen on conforming to Rust-like syntax. The language should be treated as a unique language because it does do things differently from Rust for certain syntax.

Mikael Klages • Jun 2 '19

Seems interesting, might try to get into rust again to check it out. Any plans on adding support for it in other languages?

Caleb Winston • Jun 4 '19

There aren't any plans at the moment for adding support for it in other languages.

I want this to be more of something you embed in a systems language to talk to GPUs easily. In the past you would have used OpenCL or Halide in C or C++. But now you can use Emu in Rust.

DEV Community

A language for programming GPUs

calebwin / emu

The write-once-run-anywhere GPGPU library for Rust

Overview

Top comments (4)

Read next

Building Node.js modules in Rust with NAPI-RS

🛠Ultimate Guide to Creating Your Own Google Chrome Extension🛠

Unlocking River Valley Loss Landscapes: Why Warmup-Stable-Decay Learning Rates Excel

Unlocking Efficient Training for AI Language Giants: Deep Optimizer States