Tim McNamara

Posted on Jul 6, 2021

Things I learned about creating a C API for my Rust crate by making every mistake about creating a C API

#rust #c #ffi

tl;dr

Here are my tips:

C programmers want to allocate and free their own memory. They don't want a string provided for them. They want you to write to an array that they control.
You need to "box" local variables if you want them to out-live the function.
Don't model your API after libc. In particular, you should avoid global state.

Background

A few weeks ago, I stumbled upon a way for creating unique identifiers that are still sortable: ULIDs. ULIDs are very similar to UUIDs are typically used: as random identifiers [footnote: you should ensure that you're using UUIDv4 if you're using them this way, otherwise your IDs are not random].

ULIDs have a neat trick trick though, newer IDs are higher than older IDs. IDs generated within the same millisecond are random, but if the interval between them is longer than that then you'll be able to compare them.

Naturally, I thought that I should try and implement them. And I did. In fact, some benchmarking indicates that I may have created the fastest Rust implementation: it takes my computer roughly 28 nanoseconds to generate an identifier. That's 35,000 ULIDs per millisecond.

Then I thought about exposing my Rust code as a C library. libulid. And then I discovered that I knew almost nothing about writing a C library.

Lessons

Here are a few things that I needed to fix before getting libulid ready.

Global state

One of the first things that I did was to take inspiration from the rand() function from libc. rand() requires a seed() function to be called that initializes a global random seed.

My initial API involved calling ulid_seed() and then ulid_new() later on. This created two issues. First it was horrible to work with. Secondly, it wasn't thread safe.

It turns out that the thread safety problem is well documented.

The function rand() is not reentrant, since it uses hidden state that is modified on each call. This might just be the seed to be used by the next call, or it might be something more elaborate. In order to get reproducible behavior in a threaded application, this state must be made explicit; this can be done using the reentrant function rand_r().

Why did I use functionality that is not thread safe? Well, I was working with the rust crate and didn't check the upstream libc documentation.

I was somewhat embarrassed, although very relieved, that a contributor to my code that raised the thread safety issue. Thanks Jonas!.

To avoid the issue, I ask users of the library to initialize a ulid_ctx object that stores the random seed and pass a pointer to it when creating a new ULID.

_t suffix is reserved

At one stage, I added a _t suffix to all of the types that I created. It turns out that this convention is reserved by POSIX for its types.

Let callers manage their own memory

One of the things that caused quite a few problems was attempting to create raw pointers in Rust, then hand them across the FFI boundary.

I did this to match the ULID spec, which specifies that the API to generate a ULID should look like this:

ulid()

In Rust, this is a fine thing to do. But in C, you have a decision to make: where will the memory be allocated and freed? Will it be the responsibility of the caller or the library?

I spent several days trying to figure out how to do something very simple: allocate an array in Rust on the heap, pass a pointer to that array to C, then free the memory later in Rust. I eventually gave up on getting all of the pieces work together, because it seemed like C programmers would like to do something else entirely. They want to manage their own memory.

Acknowledgements

Many thanks to the #include Discord server for offering lots of advice and encouragement during this process.

Thanks also to Jonas for providing lots of patches!

DEV Community