In the introduction of the Master Directed Graphs by example with JavaScript, we saw how useful the directed graphs could be. Before reading this blog post, I highly recommend you to read it before going further in this series.
The goal of this series is to demonstrate the usefulness and power of directed graphs by implementing concrete examples with JavaScript.
Let's start by introducing our today's topic
Our today's objective is to explain in a simplified way how the Affected/Incremental pattern works. The whole idea of detecting affected projects/packages is to process tasks (build, lint, test) only on parts of project that change hence optimizing ressources and saving time when working on heavy projects.
Take for instance a simple JavaScript frontend application (app.jsx) with two libs:
- design system library to export your custom UI components (e.g: by wrapping Material UI or Angular Material with custom themes)
- date formatting library to configure an universal set of rules to work with dates
Our project would look like this:
project-folder/
β
ββββdesign-system/
| | dist/
β β component1.jsx
β
ββββtemporal/
β | dist/
β β dateFormat.js
|
| dist/
| app.jsx
| package.json
Our app.jsx
represents our entry file and uses features from both design-system
and temporal
libs.
app.jsx
import Component1 from "./design-system/component1.jsx";
import { hoursToMilliseconds } from "./temporal/dateFormat.js";
function myApp() {
return <Component1> {hoursToMilliseconds(2)} </Component1>;
}
To build the entire application, imagine we have an npm script which bundles all libs and the entry point into a single JavaScript file:
# Bundle the whole app including `design-system` and `temporal` libs
$ npm run build
You might wonder what is the problem with this approach. The answer is that there is no problem yet and this type of bundling is probably fine most of the time for small to medium sized projects.
However if your design-system
and temporal
end up growing and you even add more libraries (more components, but also including assets such as images, fonts etc) you might encounter performance issues when building the whole app.
Let's demonstrate that very easily:
project-folder/
β
ββββdesign-system/ # 100+ components
β
ββββtemporal/ # 50+ files
|
ββββlib3/ # library using node-sass (nearly 5 MB)
|
ββββlib4/ # library using heavy npm modules
Let's say that this project takes X time in seconds to be entirely bundled. Now imagine you only change the color of your design-system/component1.jsx
:
const component1 = styled.div`
color: red;
`;
Nice! Your component now looks great... but you have to rebuild your whole app which also includes the rebuild of the heavy ones lib3
and lib4
.
We have to build everything each time we want to ship the main application while not everything has changed and therefore does not necessarily have to be rebuilt.
This type of problem is often encountered in monorepos where many apps and libraries share the same build/test/lint tools.
Introducing the killer feature: the Incremental/Affected pattern
If you're using monorepo tools such as Nx, Turborepo or Rush, you have probably heard of Incremental and/or Affected patterns.
If you are not comfortable what is a monorepo and what are their objectives, I recommend you to take a look at a little reminder about it here.
In this blog post, we will use the Affected word but Incremental means literally the same thing here.
Ok, but what is the point with directed graphs?
The tool in charge of that (e.g: Turborepo, Nx, Rush) can introspect your project, and internally emit a Directed Acyclic Graph responsible for establishing dependencies between pieces of your project. By using the emitted Graph and a persisted cache (also handle by the tool), this enables smart builds and many other tasks that depends on the state of the project (linting, testing, etc).
In the image above, we can see a project using an affected strategy to build only what really needed to be rebuilt. When building the Main App after Library 2 changed, only Library 1 and Library 2 must be rebuilt. The other part of the graph (Library 3, Library 4, Library 5) remains unaffected hence the cached version of these libraries can be used in the final build (no need to rebuild them).
Let's start by implementing a minimalist version of the Affected pattern for a project containing with three distinct libraries.
/**
* lib1 depends on lib3 (via the use of lib3.MyLib3Component)
* while library 2 is independent.
*/
const lib1Metadata = {
id: "lib1",
adjacentTo: [],
payload: {
component: `<lib3.MyLib3Component/ >`,
},
};
const lib2Metadata = {
id: "lib2",
adjacentTo: [],
payload: { component: `<div>hello from lib2</div>` },
};
const lib3Metadata = {
id: "lib3",
adjacentTo: [],
payload: { component: `<MyLib3Component>hello lib3</MyLib3Component>` },
};
We now have to express our different components in our Graph context that we will use afterwards.
Let's start by adding the core items of a graph: the vertices. A vertex can represent any type of item which in our case is a library of a given project.
import { DiGraph } from "digraph-js";
const projectGraph = new DiGraph();
// Each project is represented with a vertex (node)
projectGraph.addVertices(lib1Metadata, lib2Metadata, lib3Metadata);
Now that we added our vertices, we must represent relationships between them by adding appropriate vertices.
In our example project, the lib1
depends on lib3
(by using the lib3 component). Consequently this import creates an implicit relationship between these nodes hence needs to be represented with an edge.
/**
* lib1 depends on lib3, we must represent this relationship
* by adding an edge from lib1 to lib3.
*/
projectGraph.addEdge({ from: lib1Metadata, to: lib3Metadata });
We just finished expressing our relationships between libraries of our project so we are now ready to implement our caching system in order to enable affected builds!
/**
* Simulating a simple cache, persisting an hashed
* value of the component.
* In a real world project, you would use something
* like the "folder-hash" library to generate hashes
* for a given folder containing sub directories and files.
*/
const cache = {
lib1: {},
lib2: {},
lib3: {},
};
The first function we need is a function to build a library.
During that build process, its content is hashed and stored in the cache
object.
import crypto from "node:crypto";
function buildLibrary(library) {
// Create a SHA1 using the library content
const libraryHashedContent = crypto
.createHash("sha1")
.update(library.payload.component)
.digest("hex");
console.log(`Building library: '${library.id}'`);
// Webpack, Rollup or any other bundler can be processed here
// Store the hashed content in cache
cache[library.id].component = libraryHashedContent;
}
Using that cache, it's pretty straightforward to compare if a library changed. If we are able to detect if a library changed, it means that we are able to detect if it needs to be rebuilt:
function isLibraryAffected(library) {
const libraryHashedContent = crypto
.createHash("sha1")
.update(library.payload.component)
.digest("hex");
return libraryHashedContent !== cache[library.id].component;
}
From now on once we change something in a library, we will be able to detect changes and invalidate the cache.
Hashed build + Cache = Affected build
Now that we're able to build a library, generate a hash from it and store it in a cache where build versions can be compared over time, we can put together the core function of the Affected pattern:
function buildAffected(library) {
/**
* If the component is still the same (i.e: hash data hasnt changed), we
* don't want to rebuild it. If its "Affected", we must invalidate it
*/
if (isLibraryAffected(library)) {
// Component's hash changed, meaning we must build the library
buildLibrary(library);
return { hasLibraryBeenRebuilt: true };
}
// The lib has not changed so does not require a new build
console.log(`Using cached version of '${library.id}'`);
return { hasLibraryBeenRebuilt: false };
}
Let's try out to run an Affected build on a given library:
// build lib1 using Affected detection
buildAffected(lib1Metadata);
// => prints out: "Building library: 'lib1'";
// re-run a build without any changes multiple times
buildAffected(lib1Metadata);
buildAffected(lib1Metadata);
buildAffected(lib1Metadata);
// => prints out each time: "Using cached version of 'lib1'";
// change lib1's content
lib1Metadata.payload.component = "<div> Hello lib1 </div>";
// re-run a build
buildAffected(lib1Metadata);
// => prints out: "Building library: 'lib1'";
Ok cool, we demonstrated how this pattern works for a given library. You may wonder why the Affected pattern is so useful in big projects including hundreds of libraries that are relying on each other. The answer is the Affected pattern proves to be extremely powerful and useful in a project where unnecessary computations are saved. If you're using a project with three libraries, it's probably not worth it to implement the Affected pattern which will cost you more CPU than building from scratch every time.
Enough talk, let's continue our project example with a use case allowing us to feel its rising power.
Implementation of the Affected pattern using digraph-js
digraph-js is a library which helps building directed graphs and allows us to traverse graphs effortlessly
Following the example, we will now implement a more complete version of the pattern which includes multiple libraries detection using the digraph-js library.
As a quick reminder, we still have three libs in our project example:
const lib1Metadata = {
id: "lib1",
adjacentTo: [],
payload: {
component: `<lib3.MyLib3Component />`,
},
};
const lib2Metadata = {
id: "lib2",
adjacentTo: [],
payload: { component: `<div>hello lib2</div>` },
};
const lib3Metadata = {
id: "lib3",
adjacentTo: [],
payload: { component: `<MyLib3Component>hello lib3</MyLib3Component>` },
};
- lib1 depends on lib3 (lib1 uses "MyLib3Component")
- lib2 depends on nothing (no dependency)
- lib3 depends on nothing (no dependency)
What directed graphs allow us is that given a root library (can be any node in the graph), we can traverse all dependencies of that library using edges pointing towards other nodes of the graph. Using that feature, we are able to determine on which other libraries the root library depends on.
Let's build a function whose goal is to build all dependencies of a given root node. For now, the root node does not know the status of all its children (either they have been rebuilt or not).
// Given a root node, we traverse the graph looking for dependencies
function* buildAllRootLibraryDependencies(rootLibrary) {
// Traverse all rootLibrary's dependencies
for (const rootLibraryDependency of projectGraph.getAdjacentVerticesTo(
rootLibrary
)) {
/**
* Recursively build affected libraries starting from the deepest dependencies
* of the root library.
*/
yield* buildAllLibraryDependencies(rootLibraryDependency);
}
/**
* When we reach a dependency with no other dependencies, we
* now that we're done going deep into the dependency tree.
* If the library has been rebuilt, we must inform the
* parent in order to recursively invalidate nodes in the
* traversed path.
*/
const { hasLibraryBeenRebuilt } = buildAffected(rootLibrary);
yield hasLibraryBeenRebuilt;
}
We can traverse each dependency of the root node and rebuilt it if affected.
Now, we must include the root node in that affected process by using the buildAllRootLibraryDependencies function above.
There are 2 conditions requiring the root library to be rebuilt:
- At least one of the dependencies of the library changed (can be determined by keeping track of each children state)
- The root library itself changed (same simple hash comparison mentioned in the first part of the blog post)
Let's implement that last function:
/**
* Build everything with affected strategy including root
* library itself
*/
function buildEverythingAffectedIncludingRootLibrary(rootLibrary) {
const rootLibraryDependencies =
projectGraph.getAdjacentVerticesTo(rootLibrary);
const allRebuiltLibraries = [];
for (const dependencyLibrary of rootLibraryDependencies) {
/**
* Keep track of all the children builds to know if some
* were rebuilt. If there is no affected dependency, the
* array would remain empty
*/
allRebuiltLibraries.push([
...buildAllRootLibraryDependencies(dependencyLibrary),
]);
}
/**
* All root library's dependencies were rebuilt if necessary (i.e: affected).
* However, we now need to determine if the root library has to also be
* rebuilt. There are 2 conditions requiring the root library to be rebuilt:
* - The root library itself changed
* - Atleast one of the dependencies of the library changed
*/
const HAS_LIBRARY_BEEN_REBUILT = true;
const atleastOneLibraryChanged = allRebuiltLibraries
.flat()
.includes(HAS_LIBRARY_BEEN_REBUILT);
if (atleastOneLibraryChanged) {
buildLibrary(rootLibrary);
} else {
// Check if library itself changed by running affected detection on the root library
buildAffected(rootLibrary);
}
}
That's it! The Affected pattern is now fully demonstrated using directed graphs to traverse all dependencies of a given root node and using a cache comparison to determine if a child node must be invalidated then rebuilt.
You're not convinced? Let's confirm that by testing with a function:
function buildProjectUsingAffectedStrategy() {
console.log("\n----STEP 1-----");
// Building for the first time
buildEverythingAffectedIncludingRootLibrary(lib1Metadata);
console.log("\n----STEP 2-----");
/**
* Building for the second time but no dependencies of lib1 changed (neither
* lib3 or lib4) so it remains UNAFFECTED (i.e: using cache)
*/
buildEverythingAffectedIncludingRootLibrary(lib1Metadata);
console.log("\n----STEP 3-----");
/**
* Let's now change the content of lib3's component.
* Remember, lib1 depends on lib3 via the use of lib3.MyLib3Component so
* this change should trigger an affected build.
*/
console.log("Changing lib3's content...");
// addMutation is a function which update the value of a given vertex in the graph
projectGraph.addMutation(lib3Metadata, {
// new lib3 component
component: `<MyLib3Component>Hello affected lib3!</MyLib3Component>`,
});
console.log("\n----STEP 4-----");
/**
* Now that lib3 (dependency of lib1) changed, both lib3 and lib1 are considered
* affected. It means that we must rebuild both, starting with lib3 (lib1 must be built
* with the latest version of lib3).
*/
buildEverythingAffectedIncludingRootLibrary(lib1Metadata);
}
Here is the output:
----STEP 1-----
Building library: 'lib3' // building for the first time
Building library: 'lib1' // building for the first time
----STEP 2-----
Using cached version of 'lib3' // building for the second time but nothing changed so we can use a cached version of lib3
Using cached version of 'lib1' // building for the second time but nothing changed neither from lib3 nor from lib1 itself so we can use a cached version of lib1
----STEP 3-----
Changing lib3's content... // updating lib3 content
----STEP 4-----
Building library: 'lib3' // lib3 changed so must be rebuilt
Building library: 'lib1' // lib1 did not change but its direct dependency "lib3" changed so it must be rebuilt
That's finally it! We demonstrated the big picture of how the Affected pattern works under the hood.
GitHub repository
Feel free to check the full example here used in the final part of the blog post.
Coming next
In the next part of this series, we will talk about circular dependency detection
(which is for instance implemented by
an ESLint plugin eslint-plugin-import/no-cycle alike) that are made possible with directed graphs !
Thanks for reading!
Top comments (0)