Recently, "software supply chain attack" has been breaking all the news headlines. One infamous example is the SolarWinds attack or the 2020 United States federal government data breach. In fact, according to a report from Gartner (done in 2021, get the report here):
By 2025, 45% of organizations worldwide will have experienced attacks on their software supply chains, a three-fold increase from 2021.
What is a software supply chain, then?
It's anything needed to deliver your product in the steps of software artifacts creation.
It can be helpful to think about the supply chain of a physical good, like a car. A supply chain for a car might include things like these:
- In-house produced components, like engines and gearboxes.
- Third-party components, like seat belts and headlights.
- The factory, including the facilities, machines, and tooling used to build the components and assemble the car.
- The workers working in the production facility.
- The processes, like how to access different tools and systems, how to do quality control, etc.
For a software supply chain, it contains:
- All the code, including its dependencies, and the internal and external software you use to develop, build, package, install and run your software.
- Processes and policies that contribute to the development and delivery of your software, both inside and outside your organization, like processes and policies for accessing systems, software testing, review (both code and non-code), monitoring, communication, approval, etc.
Now that we've cleared the definition of the supply chain, let's look at some supply chain attacks: how and in what ways they happen.
Given the complexity of software products and systems and the broad reach of the software supply chain, there are numerous ways to introduce unauthorized modifications to software packages in different steps of the software development lifecycle (SDLC).
A typical software development workflow looks like this:
Supply chain threats
- At point A of the development lifecycle, bad code could be submitted to the repository. One of the most famous examples would be the Linux hypocrite commits, where researchers attempted to introduce vulnerabilities intentionally into the Linux kernel. Due to the scale of large open-source projects, it can be hard to discriminate between community members with good or malicious intent.
- At point D, a compromised build platform can also cause issues. For example, in the SolarWinds attack mentioned at the beginning of this article, attackers compromised the build platform and installed an implant that injected malicious behavior during each build.
- At point E, a harmful dependency might be used. For example, in the event-stream vulnerability incident, attackers added a dependency and then updated the dependency to add malicious behavior. The harmful dependency itself can be attacked at points A-H, and the dependencies of the dependency can also be attacked at points A-H, recursively.
- At point F, an artifact not built by the CI/CD system could be uploaded. For example, in the CodeCov incident, the attacker used leaked credentials to upload a malicious artifact, and users downloaded it directly.
I could go on, but I've already made my point: software and software development have become increasingly complicated, and there are many steps and aspects related to them, and every single step could go wrong.
In the past, legacy software supply chain attacks happened in an "exploiting" way, where the attackers preyed on unpatched publicly disclosed open-source CVEs (Common Vulnerabilities and Exposures). For example, the infamous Struts incident at Equifax.
However, the new generation of software supply chain attacks is far more dangerous. Attackers are no longer waiting for publicly disclosed CVEs. Instead, they are taking the initiative actively, injecting malicious code into open-source projects that feed the global supply chain.
We could say that the bad guys are also "shifting left," just as DevSecOps does!
By shifting their focus "upstream," attackers can infect a single component, which will then be distributed "downstream" using legitimate software workflows and update mechanisms.
According to the study of security threats in the npm ecosystem done by Darmstadt University researchers in 2019:
"391 highly influential maintainers affect more than 10,000 packages, making them prime targets for attacks."
If an attacker successfully compromised projects supported by one of these 391 maintainers, they could dramatically enlarge the impact circle of the attack. For example, the Darmstadt team said that 20 maintainers can reach more than half of the npm ecosystem. The risks increase even further because the Linux Foundation's Core Infrastructure Initiative found that seven of the top 10 most-used software packages were hosted under individual developer accounts. The researchers then questioned, "what happens if one of these accounts is hacked? Would you, farther down the software supply chain, even know?"
The next-gen attacks are made possible for a few reasons:
- Open-source projects typically rely on hundreds — if not thousands — of dependencies from other open-source projects. They are also used a lot. That makes it difficult to evaluate the security of every new version of a dependency. Few read the source code of the applications they rely on; there are too many of them, their codebases are too large, and chances are that most people reading the source code couldn't do a proper security analysis anyway.
- Open-source relies on thousands of contributors, and discriminating between community members with good or malicious intent is difficult, if possible.
- Open-source is built on a "web of trust," which may be more secure than closed-source projects but still creates an environment whereby attackers can prey.
The next-gen attacks are already bad enough, but the bad news is they are still on the rise. Attacks targeting open-source have increased to 430% since the Darmstadt report was published (216 such attacks recorded from Feb 2015 to Jun 2019, compared to 929 recorded from Jul 2019 to May 2020).
How to improve supply chain security, then?
Although "point" solutions exist that can solve a specific threat or vulnerability in a particular step of the supply chain (or the SDLC), it isn't enough. We lack a comprehensive, end-to-end framework that governs the whole supply chain in every aspect, defines how to mitigate threats across the software supply chain, and provides appropriate security guarantees.
Enter the SLSA framework.
Supply chain Levels for Software Artifacts, or SLSA (read: salsa), is inspired by Google's internal "Binary Authorization for Borg," which has been in use for the past 8+ years and is mandatory for all of Google's production workloads.
The goal of SLSA is to improve the state of the industry, particularly open-source, to defend against the most pressing integrity threats.
Who's SLSA for, then? Whether you're a developer, a business, or an enterprise, SLSA provides an industry standard, a recognizable and agreed-upon level of protection and compliance.
In its current state, SLSA is a security framework, a set of incrementally adoptable security guidelines being established by industry consensus. Consider it a checklist of standards and controls to prevent tampering, improve the integrity, and secure packages and infrastructure in your projects, businesses, or enterprises. It's how you get from "safe enough" to "being as resilient as possible" at any link in the software supply chain.
The standards set by SLSA are guiding principles for software producers and consumers: producers can follow the guidelines to make their software more secure, and consumers can make decisions based on a software package's security posture.
SLSA designed four security levels, which are incremental and actionable, to protect against specific integrity attacks. SLSA levels are like a common language to discuss how secure software, supply chains, and their parts are. SLSA 4 represents the ideal end state, and the lower levels represent milestones with corresponding integrity guarantees.
Here's a quick introduction:
- SLSA 1: documentation of the build process. The build process must be fully scripted/automated and generate provenance (the metadata about how an artifact was built, including the build process, top-level source, and dependencies). Knowing the provenance allows software consumers to make risk-based security decisions. Provenance at SLSA 1 does not protect against tampering, but it offers a basic level of code source identification and can aid in vulnerability management. SLSA 1 is easy to adopt, giving you supply chain visibility and being able to generate provenance.
- SLSA 2: tamper resistance of the build service. It requires version control and a hosted build service that generates authenticated provenance. These additional requirements give the software consumer greater confidence in the origin of the software. At this level, the provenance prevents tampering to the extent that the build service is trusted. SLSA 2 starts to protect against software tampering and adds minimal build integrity guarantees.
- SLSA 3: extra resistance to specific threats. The source and build platforms meet specific standards to guarantee the source's audibility and the provenance's integrity, respectively. We envision an accreditation process whereby auditors certify that platforms meet the requirements, which consumers can rely on. SLSA 3 provides much more robust protections against tampering than earlier levels by preventing specific classes of threats, such as cross-build contamination. SLSA 3 hardens the infrastructure against attacks, and more trust is integrated into complex systems.
- SLSA 4: the highest level of confidence and trust. It requires a two-person review of all changes and a hermetic, reproducible build process. Two-person review is an industry best practice for catching mistakes and deterring harmful behavior. Hermetic builds guarantee that the provenance's list of dependencies is complete. Though not strictly required, Reproducible builds provide many audibility and reliability benefits. Overall, SLSA 4 gives the consumer confidence that the software has not been tampered with. SLSA 4, the highest assurances of build integrity, measures for dependency management are in place.
Note: sometimes we also talk about SLSA 0, with level 0 meaning there is no guarantee. SLSA 0 represents the lack of any SLSA level.
It's also worth noting that the SLSA level, by design, is not transitive. A level describes the integrity protections of an artifact's build process and top-level source but nothing about the artifact's dependencies. It means that each artifact's SLSA rating is independent of its dependencies (a level 4 artifact can be built from level 0 dependencies).
This is to make the problem tractable: if SLSA 4 required all dependencies to be SLSA 4, we'd have to work equally on high and low-risk dependencies, which are not a priority. This allows parallel progress and prioritization based on risk above all.
Incremental improvements and intermediate milestones recognized by lower SLSA levels are the most important part of the framework. They will already go a long way toward improving the security of the ecosystem. Achieving SLSA's highest security level is a multi-year objective, on the other hand.
SLSA helps to protect against common supply chain attacks. Let's review some of the examples mentioned in section 2 and see how SLSA can help:
In a Linux "hypocrite commits" scenario, researchers attempted to introduce vulnerabilities into the Linux kernel via patches on the mailing list. SLSA 4's two-person reviewed principle could have mitigated this. Two-person review can catch most (but not all) of the vulnerabilities.
The SolarWinds attack happened because of a build platform compromise. This could have been mitigated by SLSA 3's ephemeral environment rule, which ensures that each build environment is ephemeral, with no way to persist changes between subsequent builds. Also, SLSA 3 requires stronger security controls for the build platform, making it more difficult to compromise and gain persistence.
In the CodeCov incident, the attacker uploaded a malicious artifact. This could have been mitigated by both SLSA 1 and 2:
- SLSA 1 requires provenance showing that the package came from the expected CI/CD pipeline and provenance with a
subjectmatching the hash of the package;
- SLSA 2 accepts provenance that was cryptographically signed by the public key corresponding to an acceptable builder. If SLSA were enforced in the CodeCov case, the artifact's provenance would have shown that the artifact was not built expectedly from the expected source repo.
SLSA is a practical framework for end-to-end software supply chain integrity based on a model proven to work at Google. It guides you through gradually improving the security of your software. Artifacts used in critical infrastructure or vital business operations may want to attain a higher level of security, whereas software that poses a low risk can stop when they're comfortable.
SLSA is designed to be incremental and actionable and to provide security benefits at every step. Once an artifact qualifies at the highest level, consumers can have confidence that it has not been tampered with and can be securely traced back to the source—something that is difficult, if not impossible, to do with most software today.
In its current incarnation, SLSA is no more than a set of guidelines, but it's worth mentioning that the plan is to transform this framework into a reference certification for supply chain security. In the future, SLSA will support the automatic creation of auditable metadata used as input for policy engines to give "SLSA certification" to a particular package or build platform.
In the following article of the series, we will look at the
sigstore/cosign tool which supports container signing, verification, and storage in an OCI registry. Cosign aims to make signatures invisible infrastructure. Stay tuned, and see you in the following piece!
PS: In the meantime, you can already get started with more supply chain security resources here!