DEV Community

Ian Muchina
Ian Muchina

Posted on • Originally published at ianmuchina.com on

Reproducible Builds

A reproducible build is one that produces the same byte for byte output when given the same input. These builds aren’t common. Mostly because of compiler defaults.

The things that can make a build non-reproducible are:

  • Timestamps
  • Unique IDs
  • Build paths
  • etc

Why they matter

Reproducible builds have inherent security. They allow us to verify the source code a binary comes from. This makes detecting changes or tampering straightforward.

  1. Compile the program on at least two different systems
  2. Compare the checksums.
  3. If they match, that’s good. If they don’t match, something is wrong.

The builds have more integrity, which benefits everyone. For security-minded people, it means a straightforward way of detecting backdoors in the build process. For open source enthusiasts, it means a clear way of detecting GPL violations. For everyone else, we get safer software.

You can find out how to acheive reproducible builds at reproducible-builds.org/docs.

Attacks on build systems

Back-doors introduced in the build process are not easy to detect. Most of the detection happens too late when the damage is already done. The attacks can have a high impact in a short period, so early detection is important.

There have been many attacks like this in the past. Some of them are:

Defending against them

Reproducible builds are the best way to defend against these kinds of attacks. Attackers lose their incentive bacause they are detected quickly and need to compromise more systems.

The disadvantages

Like everything there are downsides. Some of them are:

What they don’t protect us from

Reproducible builds don’t protect us from malicious developers. A developer could knowingly write vulnerable code that it looks like a mistake when discovered. This called underhanded code.

In the paper titled Trusting Trust Ken Thompson asks us:

To what extent should one trust a statement that a program is free of Trojan horses? Perhaps it is more important to trust the people who wrote the software.

Who does reproducible builds?

Many open-source projects have reproducible builds to assure users of their integrity. Some of them include:

  1. Bitcoin
  2. Tor Browser
  3. F-Droid
  4. Signal
  5. Telegram
  6. more

Linux distros

Around 80%-90% of the packages in Linux distributions like ( Arch, Debian, OpenSUSE, NixOS, Guix) are already reproducible. You can find the exact numbers here.

Digital signatures

Digital signatures still have their place. They are useful when verifying who a document or message comes from. However, they aren’t useful when verifying the source code a binary comes from. Some forms of digital signatures can get in the way, as explained by telegram developers in this article.

Conclusion

You can’t trust non-reproducible software. It’s a single point of failure. Mike Perry of the tor project described it in 2013 as follows:

I don’t believe that software development models based on trusting a single party can be secure against serious adversaries anymore, given the current trends in computer security.

This statement is true to this day.


Further reading

Many people have written about reproducible builds and have gone into more detail than I have in this post. Here are some of them.

reproducible-builds.org

A website with technical information on reproducible builds. It also has status updates on Linux distributions.

Reflections on Trusting Trust

A paper by Ken Thompson. He asks us what if compilers had backdoors. Would it possible to even detect & prevent such an attack?

Countering Compiler backdoors

David A. Wheeler answers the above question. He proposes a method called Diverse Double-Compiling.

The Octopus Scanner Malware

Writeup on the discovery of a supply chain attack that targeted developer’s machines.

Verifying the source code for binaries

An Lwn article on reproducible builds

Top comments (0)