Today, I learned how to correctly use Cargo build scripts. Or, more precisely, I learned how to do one particular thing correctly, but it was significant enough for me that I decided write it down. Of course, had I read the Cargo Book more carefully before, I would have saved myself some time, and there would be no dramatic revelation, and no reason to write this post either. I guess what I am trying to say is: thank goodness my reading sucks.
Keywords: cargo
build scripts
code generation
Context
My problem arose while implementing a little library that reads from and writes to a Protocol Buffer stream. As described by the authors themselves:
Protocol buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data.
The very first step is to define a message format in Protobuf's dedicated grammar. The message is parsed by the Protobuf library (available for various programming languages), and related code is generated for the target language (in our case, Rust). Each message is an object with a bunch of getters and setters (details differ depending on the language). These messages then can be read from or written to streams available in the Protobuf library. It is all quite straightforward. Take this simple message from the Protobuf web page:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}
In Rust, we would use the following chunk of code to construct a Person
:
let mut person = Person::new();
person.set_name("Butler");
The details are not very important here. What is important is that this code needs to be generated before the project is compiled, or else there would be no Person
to speak of. And, if possible, we would prefer if it was neatly integrated into our build system.
Cargo build scripts
Our situation is by no means unique. Probably the most canonical example is a library that provides Rust API bindings to some C library, such as libc
, git2
, and many others. Before compiling our Rust crate, we first need to compile the C code, and maybe even generate Rust FFI bindings from C headers (see bindgen
crate). Because this is a common pattern and Rust ecosystem is fantastic, there is a standard solution for this: build scripts.
Simply put, Cargo allows us to define a build script, by default named build.rs
and located at the root of the project. This is essentially a regular executable (with main
function and all), except Cargo provides a bunch of environment variables with useful build information. The build script in turn communicates back to Cargo by writing instructions to the standard output. For example, printing cargo:warning=MESSAGE
will instruct Cargo to print a warning to the terminal. More on that a little later, but for now here is a simple example that compiles a C source file using cc
crate:
fn main() {
cc::Build::new()
.file("src/example.c")
.compile("example");
}
Note that there is a special section in Cargo.toml
where you can define dependencies that are only used for building: [dev-dependencies]
.
Generating Protobuf code
Now that we have basic information about build scripts, we can take a shot at generating the Rust code for Protobuf messages. Luckily, there is already a crate that will make it very simple:
[dev-dependencies]
protobuf-codegen-pure = "2.14" # Might be different by the time you read this
[dependencies]
protobuf = "2.14" # This will be needed to use the generated code as protobuf messages
A quick look at the documentation explains it all:
fn main() {
protobuf_codegen_pure::Codegen::new()
.out_dir("src/protos")
.inputs(&["protos/person.proto"]),
.include("protos")
.run()
.expect("Codegen failed.");
}
Let's break it down. We first create a Codegen
object, which implements a builder pattern. Then, we define where the generated files should be created. Finally, we need to point to the input files that contain the message definitions, and to the directory containing these files. That's it. Simple, right?
Well, not so fast. There is a catch. See, Cargo doesn't want us to write to the src
directory:
Build scripts may save any output files in the directory specified in the OUT_DIR environment variable. Scripts should not modify any files outside of that directory.
But why would they care? Well, it is a security concern. If a crate is built remotely, we don't want to allow what is effectively a user-defined program to write anywhere they want. A good example is Docs.rs
, which hosts API documentation of all crates available on crates.io
. They limit the program's write permissions to only one directory and pass it via an environment variable OUT_DIR
. In fact, if you follow the instructions from protobuf-codegen-pure
crate, your documentation on Docs.rs
will fail to build (this is precisely how I found out about all of this!).
Correcting the build script
So how do we fix our build script? Let's try this:
fn main() {
let out_dir_env = env::var_os("OUT_DIR").unwrap();
let out_dir = Path::new(&out_dir_env);
protobuf_codegen_pure::Codegen::new()
.out_dir(out_dir)
.inputs(&["protos/person.proto"]),
.include("protos")
.run()
.expect("Codegen failed.");
}
But where is our file now and how do we use it? Rust provides include!
macro that copies the content of a file to the file it is invoked from. For example, here is a little snippet from lib.rs
showcasing this:
include!(concat!(env!("OUT_DIR"), "/person.rs"));
fn new_person(name: &str) -> Person {
let mut person = Person::new();
person.set_name(name);
person
}
Does it work now? Unfortunately, not quite. protobuf-codegen-pure
takes liberty to add some module-level comments and attributes suppressing certain warnings, which now fail to compile:
error: an inner attribute is not permitted in this context
--> /home/elshize/dev/ciff/target/debug/build/ciff-e8fd3067377fd4eb/out/common_index_format_v1.rs:5:1
|
5 | #![allow(unknown_lints)]
| ^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: inner attributes, like `#![no_std]`, annotate the item enclosing them, and are usually found at the beginning of source files. Outer attributes, like `#[test]`, annotate the item following them.
This is caused by the indirection of include!
. The good news is that it is a known problem and chances are this is already resolved when you are reading it. But since I don't have the luxury of travelling through time, I would like to find a workaround. Besides, it is a good opportunity to show how powerful build scripts really are. We are by no means limited to what the codegen library generates for us.
The objective is to get rid of those failing comments and attributes. On the other hand, I would still like to be able to suppress warnings from the generated code. To do that, we can create person.rs
module, which will simply define attributes and include the generated code, which can be later re-exported by lib.rs
. For example:
#![allow(unknown_lints)]
#![allow(clippy::all)]
#![allow(clippy::pedantic)]
#![allow(box_pointers)]
#![allow(dead_code)]
#![allow(missing_docs)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
#![allow(non_upper_case_globals)]
#![allow(trivial_casts)]
#![allow(unsafe_code)]
#![allow(unused_imports)]
#![allow(unused_results)]
include!(concat!(env!("OUT_DIR"), "/person.rs"));
Great, now the only thing that is left is to remove these from the generated file. This can be easily done in build.rs
. Once the file is successfully generated, we can read it line by line and filter out any line that starts with #!
or //!
.
fn main() {
let out_dir_env = env::var_os("OUT_DIR").unwrap();
let out_dir = Path::new(&out_dir_env);
protobuf_codegen_pure::Codegen::new()
.out_dir(out_dir)
.inputs(&["protos/person.proto"]),
.include("protos")
.run()
.expect("Codegen failed.");
// Resolve the path to the generated file.
let path = out_dir.join("person.rs");
// Read the generated code to a string.
let code = read_to_string(&path).expect("Failed to read generated file");
// Write filtered lines to the same file.
let mut writer = BufWriter::new(File::create(path).unwrap());
for line in code.lines() {
if !line.starts_with("//!") && !line.starts_with("#!") {
writer.write_all(line.as_bytes()).unwrap();
writer.write_all(&[b'\n']).unwrap();
}
}
}
Result? See for yourself.
Conclusions
I have a few takeaways from my little experiment. First, the Rust ecosystem, although rich and powerful, it not yet fully matured. Certain details are still being ironed out, such as those in protobuf-codegen-pure
. But this is to be expected. I think what is more important is that these libraries are out there, and how many people are actively working on making them better each day. But most of all, I am often blown away by how well thought out many functionalities of Rust or Cargo are, especially compared to those available, say, in C++. Build scripts are one of those gems that elevate Rust to the great piece of technology it is.
Questions? Comments? I am @elshize on Twitter and @siedlaczek at Mastodon.Social. Feel free to say hi.
Top comments (6)
what's the advantage of
protobuf-codegen-pure
overprost-build
?An alternative to this would be to copy the generated rust module from
$OUT_DIR
to$CARGO_MANIFEST_DIR/src
. In the past, I've used github.com/danburkert/prost for protobuf things, and used something along these lines:My experience has generally been that the actual protobuf definitions don't change very often, and the generated code isn't build-dependent, so committing the generated file isn't a huge deal (or you could set it to be ignored). You can then shave a little bit of time off the compilation by only regenerating the code if the definitions are newer than the generated module.
Thanks for sharing. I didn't know about
prost
and would love to know how it compares withprotobuf-codegen-pure
if you know. For one, it seems to write toOUT_DIR
by default.Do I understand correctly that
src_dir
will point to yourproject/src
directory? If so, we will experience the same problem of not being able to write to it on Docs.rs, right? I guess your suggestion would be to not run the generation as a build script but rather do it manually and commit the generated files, which certainly seems like a reasonable option for protobuf, which shouldn't change between builds.As I recall, I used
prost
because at the time it offered me a way to customize derived traits for the generated types (I needed theserde
traits), while the other protobuf crate I looked at did not. That might not be the case now.CARGO_MANIFEST_DIR
is the directory whereCargo.toml
lives, so$CARGO_MANIFEST_DIR/src
is the project'ssrc
directory.In my case, the definitions were in another repository, and might not always be available if that repo had not been checked out in the right place on the machine doing the build, but I still wanted to be able to build in that case.
Thus, I do run the generator as a build script, but I also commit the output to source control, so
rustdoc
ought to see it. There's some logic in the build script (that I didn't post) that only runs the protobuf generator if the module is older than the definitions. If I need to force it to regenerate, I can always locally delete the generated module and build, and then commit any differences.Ok, that explains a lot, thanks for clarifying. Sounds like a sensible approach.
Very nice article, thanks!