DEV Community

Cover image for The node_modules problem
Leonardo Teteo
Leonardo Teteo

Posted on

The node_modules problem

I think I'm not the first one to talk about this problem even here in dev.to. I quick research trying to found any solution concluded with the image that is the head of this text. The node_modules folder is where your project dependencies are stored, common knowledge. Its weight is also common knowledge.

Why I decided to vent my frustration now

Black Friday is here! It means discounts and the opportunity to update your computer. Therefore I decided to buy a SSD to boost the performance of my laptop, from 1 TB HDD to 500 GB SSD. All my files right now sums 299 GB, so I will not lose much space, but I decided to do the housekeeping work even so, this includes making backups of my projects. Not all projects I make I put on GitHub, sometimes I'm just experimenting and it is not worth the trouble, but I keep them anyway.

When I started the copy & paste process I remembered how heavy node_modules are...

Some comparisons

One example that shows clearly the problem is the node_modules folder of my ToRead CLI project as you can see in the imagem below.

node_modules property window

The size of the folder is not really the problem although I will get to that later, but 15.000 files and more than 1800 folders!? Are you kidding me?! It is a simple CLI project with 5 files! Just for a comparison, let's see how many files and folders there is in the Windows folder:

Windows folder property window

While the system was counting I really thought node_modules would win this, but no. In any case, the folder has almost half of the amount of files an entire operating system has!

As I've said, the problem when copying node_modules folder from one place to another is not the size, it is the amount of files and folders, the complexity of the tree. It is a nightmare for a HDD. It takes many minutes to discover all files let alone copy them. In the end, it also impacts npm performance and there are memes for that also.

Waiting for npm install

Other comparisons come from my passion for serverless. It is not rare for me to implement the same function in both Java and Javascript and since you have to bundle the function along with its dependencies it is a good way to compare which one is more efficient in dependency management. In one of my projects I wtote the function in both languages with virtually the same features and Java bundle size is 11.1 MB and NodeJS bundle size was 29.0 MB. Therefore, NodeJS can do a better job at the size of dependencies as well.

What other languages do

Besides NodeJS I have experience dealing with dependencies in two more languages: Java and C#. They have, in my opinion, a very similar way of handling dependencies and a much more efficient way than NodeJS.

Java has Maven, Gradle and other dependency management applications that works basically the same. There is a remote repository of the dependencies, generally Maven Central and a local repository. Maven always looks for the dependency in the local repository first and if not found it downloads from the remote repository. The dependencies are not within the project, like node_modules folder, it is more global, it is downloaded once and can be used by many projects, just add to your pom.xml.

C# follows the same idea, you list your dependencies in a .csproj file and Nuget takes care of the dependencies having also a remote and a local repository. It is much more efficient to handle dependencies this way, download once use in any project locally.

I think there is also a difference in culture and the way the languages were structured and what people see as libraries. Java has a very mature core of libraries that can deal with almost anything, common scenarios or not. Therefore, libraries in Java generally are meant to be an abstraction of what Java already has, making it easier to use. Therefore, the libraries have a more shallow dependency tree, reaching the Java core libraries much quicker.

What I see in NodeJS on the other hand is the opposite, everything can become a library, even a library that sums two numbers (hypothetical example, I hope) and libraries are heavily dependent of one another generating deep dependency trees, many files and folders.

Conclusion & Discussion

I am certainly not qualified to criticize NodeJs structure and engineering, but as a user I clearly see a problem and some lessons from other languages that can be used to improve the dependency management, which is paramount nowadays for almost every application. What do you think this problem came to be and what has been done to solve it? It would be very interesting to hear from more experienced developers what you do to remedy this.

Top comments (89)

Collapse
 
ballpointcarrot profile image
Christopher Kruse

At the risk of being too snarky this morning:

a library that sums two numbers (hypothetical example, I hope)

npmjs.com/package/addition2

Collapse
 
vitalcog profile image
Chad Windham

Yup. My first thought when I read that line was, "I don't even have to check. I'm sure it is there..."

Collapse
 
geocine profile image
Aivan Monceller

and it has 4 downloads 😂

Collapse
 
ytjchan profile image
ytjchan

Lucky it doesn't have its own dependencies...

Collapse
 
mattkocaj profile image
matt kocaj

But really.. how surprised would have you been? :p

Collapse
 
leoat12 profile image
Leonardo Teteo • Edited

Thanks for making me lose my hope in humanity today. hahahaha

Collapse
 
yorodm profile image
Yoandy Rodriguez Martinez • Edited

That's the reason why some developers can't have nice things.

Collapse
 
ssomlai profile image
Simon Somlai

Aah damn, I just needed something that could add four numbers. I'll keep looking!

Collapse
 
ballpointcarrot profile image
Christopher Kruse

Lol - looks like a good time to submit a pull request!

Collapse
 
lkho profile image
LKHO

Anyone want to try out addition3? I have added support for TypeScript/ES6 and some more syntactic sugar for the goodness' sake. I have tried very hard not to make it the same functionality of lodash.add so that it can be added as a dependency reasonably :)

Collapse
 
asday profile image
Asday

github.com/sentialx/add-2-numbers too. Less features, but a better algorithm.

Collapse
 
josepalacid profile image
Josep Alacid

LOL

Collapse
 
josepalacid profile image
Josep Alacid • Edited

You just uploaded it!
Didn't you?
;)

Collapse
 
asday profile image
Asday

Well probably not, because this thread is coming up on a year old, now.

Collapse
 
jeroka profile image
Esteban Rocha • Edited

I'm using Yarn these days, I have no issues with disk size cap as I got like 2TB SSD raids and it's so freaking fast and cool that it works just right for me, however if you or anybody is willing to try PNPM it seems the way to go at solving the node_modules size issues

Collapse
 
bennypowers profile image
Benny Powers 🇮🇱🇨🇦

Yeah give pnpm a try, I think you'll like it.

Collapse
 
leoat12 profile image
Leonardo Teteo

I'm sure going to give it a try. Thanks!

Collapse
 
carlymho profile image
Carly Ho 🌈

Ooh, I hadn’t heard about PNPM and I’m excited to try it and get whole gigabytes back, hahaha

Collapse
 
zkochan profile image
Zoltan Kochan

By the way, there is a pnpm tag on dev, so you can follow it dev.to/t/pnpm

Collapse
 
qm3ster profile image
Mihail Malo

Apparently there's also @zkochan on dev.to
A welcome addition 👌😫

Collapse
 
leoat12 profile image
Leonardo Teteo

I tested on one of my projects and it looks a bit faster indeed, but I don't know if it just that it is in memory right now, so I will test for the upcoming days to see its performance for real.

Collapse
 
chriscapaci profile image
Chris Capaci

Why copy it at all? The power of NPM is the you can recreate the exact environment described in your package.json file with one command. Copying the node_modules directory is pointless.

Collapse
 
leoat12 profile image
Leonardo Teteo • Edited

I ended up deleting all node_modules folders before copying it because it would be impossible to copy otherwise, but my first thought was convenience, I see my projects folder with dozens of folders of projects, the first thought is to select everything, zip it and copy to my backup location.
My main point in the text is that there are other languages that deal with it better, providing what you said, recreation of environment without adding a dependency folder in every project with thousands of files and folders.

Collapse
 
oscherler profile image
Olivier “Ölbaum” Scherler

Backing up your computer or replacing your hard drive is not working against the framework/community/tools. Those are common tasks, and if a particular tool makes them awful for users because of some design decisions, then the users are not to blame and are allowed to complain.

Collapse
 
mroggy85 profile image
Oskar Okuno

If your code is under source control what is the need for copy them for backup?

Thread Thread
 
leoat12 profile image
Leonardo Teteo

Not all of the projects are.

Thread Thread
 
mroggy85 profile image
Oskar Okuno

I think that's where your problem lies and then your article is about one of the symptoms when not using source control.

Node_modules is not designed for moving around. With all due respect, but I think you're just using it in a way it was never intended.

Thread Thread
 
leoat12 profile image
Leonardo Teteo

I never intended to move them around, it was a natural action to copy & paste all folders in the project folder, after I remembered that I canceled the copy, deleted all of them and started the copy process again. However, the problem still remains, it has too many files and it is heavy to download and install. NPM tends to take a long time here to finish. I think my point remains.

Thread Thread
 
mroggy85 profile image
Oskar Okuno

I understand that you come from a different environment Java/C#. This was also my first languages before I learned Javascript through jQuery/React/Nodejs. I had to learn that there are different philosophies in play here.

Most of the modules are made by hobbyists that do it on their free time. There are great ways to minimize libraries, e.g. aviod duplications by using peerDependency in package.json or make sure you strip away unnecessary files during publish. But, you will have to get use to that there will be a lot of small modules (unix style: doing one thing and doing it great).

Instead of working against the framework/community/tools please try to understand it first and then try to come with improvements in the form of PRs, issues and encouraging posts about how to do it right. I don't think your current attitude will help you succeed in becoming a better Nodejs developer. Because the community need more good developers. Thank you for understanding.

Thread Thread
 
davertron profile image
davertron

I put a bunch of my projects in Dropbox, and it spends TONS of time syncing node_modules which is completely pointless. If anyone knows if there's a way to tell Dropbox to ignore node_modules folders I'm all ears...

Thread Thread
 
mroggy85 profile image
Oskar Okuno

Why are u using Dropbox instead of git?

Thread Thread
 
davertron profile image
davertron

I do use git in certain cases but I don't always need a git repo for every JavaScript project.

Thread Thread
 
mroggy85 profile image
Oskar Okuno

I understand. But, I would not recommend you using Dropbox as source control, because you run into the problems as you just mentioned previously.

Thread Thread
 
keco39 profile image
Kevin Cocquyt

When using Azure DevOps, like in my case, adding a git repo can be as fast as creating a project and saving it to Dropbox.

PS: Dropbox gets full fast (not paying for it)!
PPS: Use a .gitignore file but I guess you knew that.

Thread Thread
 
frederikheld profile image
Frederik Held

@mroggy85 :

"If your code is under source control what is the need for copy them for backup?"

"I think that's where your problem lies and then your article is about one of the symptoms when not using source control."

Not every project is meant to be on GitHub and a local repository won't fix the problem.

The problem is that you have to delete all the node_modules folders manually in a up to infinity number of project folders and then re-install them after moving.

Maybe you can do that with a little cli magic, but that's not really user friendly and will take it's time...

Thread Thread
 
mroggy85 profile image
Oskar Okuno

I understand that it can be a litte inconvenience that one time you move your project. But I would not call it a common use case moving project files around. Then, I would suggest that you review your development process instead.

Thread Thread
 
thequux profile image
TQ Hirsch

There are lots of steps between github and a local repo. For example, for small one-off projects I'll start with a local repo and then clone it on my other dev machine (each repo is a remote for the other, so I can easily sync changes back and forth). As it gets bigger, it can be pushed to the gitolite install I have on my media PC (which was just a random machine I had lying around that was always on; a Raspberry Pi would do just as well). Only when I want other people to look at it do I push to Github.

Even if you're just using local repos, git still helps you out; you only need to back up the .git directory; to restore, you put it back in place and run git checkout HEAD. No backing up of node_modules necessary!

Collapse
 
akirodic profile image
Aki Rodić

I do it when I archive my projects because there is no guarantee that npm will be there in say 20 years from now.

Collapse
 
avalander profile image
Avalander • Edited

Great rant, I wholeheartedly agree with everything you said.

I think that the major problem is the carelessness with which library developers pull in dependencies.

I have a pet project with one production dependency and three development dependencies and npm has pulled in a total of, no kidding, 2352 packages.

One of those packages is called path-is-inside and is a simple script with fifteen lines of code that checks whether a path is inside another path.

Now, I'm not saying that everybody should be reinventing the wheel all the time everywhere, but if developers are pulling in packages instead of writing fifteen-line utility functions (it's not even hard to figure out if a path is inside another path), no wonder that there are so many jokes about node_modules out there.

Personally, I'm always wary of pulling in dependencies and I always try to avoid packages that seem to have an unreasonable amount of dependencies for the job they do, although I'm more liberal with development dependencies (after all, there's only so much you can avoid when you're stuck with nodemon and webpack).

Collapse
 
wtgtybhertgeghgtwtg profile image
wtgtybhertgeghgtwtg

it's not even hard to figure out if a path is inside another path

It's not that it's hard, it's that, between normalization and platform (Windows) specific issues, it's easy to get wrong. Reimplementing it is not worth the 7kb you may save.

Collapse
 
avalander profile image
Avalander

I would normally concede that it's a grey zone, but path-is-inside doesn't even do path normalisation (which, for the record, you can easily do with node's path module). I'd say that 15 lines of code to save 7kb and a broken implementation is worth every bit of it.

Anyway, this was a particular example, you could argue about every single small package that there is a reason you may want to add it as a dependency instead of implementing it yourself and you would be right. However, if we don't draw the line anywhere, well... we end up with 1875 subfolders in our node_modules folder. I just wish library developers were more careful when adding dependencies and only add them when they are truly required and have decent quality.

Thread Thread
 
wtgtybhertgeghgtwtg profile image
wtgtybhertgeghgtwtg

I'd say that 15 lines of code to save 7kb and a broken implementation is worth every bit of it.

In all likelihood, you're not saving 7kb; another library you're using probably has the same requirement and probably uses that same package. And if you get all libraries to do as you do, you now have n identical implementations of a 1kb function as opposed to a constant 7kb.

if we don't draw the line anywhere, well... we end up with 1875 subfolders in our node_modules folder.

node_modules aren't huge because somebody pulls into a small library. They're huge because people'll publish unneeded files, or don't split up their library into more focused libraries, or don't make an effort to deduplicate their dependency trees, or keep support for old node versions, or pin to specific versions so other dependencies can't deduplicate.

I just wish library developers were more careful when adding dependencies

It doesn't have to be a wish; if it is important to you, it is totally within your power to attempt to rectify that which you find a mistake (at least as far as open source goes). Many repositories will be responsive or even appreciative if you make an issue.

Collapse
 
hello10000 profile image
a

My folder reached a 380mb size & i hadn't even started coding

Collapse
 
arcanis profile image
Maël Nison

Hey, just a heads up since you don't reference it in your post! 😃

Yarn has worked on a feature aiming to remove node_modules folders from the equation. It works, has already shipped, and is used in production in various places. As for npm, they've started working on an experimental project sharing some aspects, called Tink.

For more information, the RFC that introduced Plug'n'Play (the name of this "no node_modules" install strategy) is github.com/yarnpkg/rfcs/pull/101 - there's a lot of great discussion there.

Collapse
 
leoat12 profile image
Leonardo Teteo

I knew yarn, but I never used it indeed. I will take a look at the PR you linked and yarn in general, it may be better than NPM in this management. Do you think it is better?

Collapse
 
arcanis profile image
Maël Nison

I'm part of the Yarn core team, so my opinion is biased ;)

Overall both Yarn and npm are pretty good tools. We tend to ship more features, in part thanks to our community which frequently contributes, but npm isn't that far behind.

Plug'n'Play is currently exclusive to Yarn, though (Tink is quite different, I wrote an article about it not too long ago).

Thread Thread
 
leoat12 profile image
Leonardo Teteo

I found your text here, I will read it. :)
Since I wrote this article, I need to be opened for everything so I will test yarn thoroughly. It looks nice, starting by the mascot. :D

Collapse
 
vitalcog profile image
Chad Windham

Words cannot describe how much I love that picture...

Collapse
 
danielescoz profile image
Daniel Escoz

So, without reading other comments, let me add my two cents...

I've done a quick check on a typical microservice at work. I got 200MB and 19k files for node_modules. If I do the same with a virtualenv site-packages directory of another project at work, i get 10k files and 550MB.

I have no idea if these are typical sizes, but it's what I have handy. In summary: two projects, both in production, with similar sizes, node wins on files with a x2 factor and python wins on size with a x2.25 factor. That isn't so bad for node, is it?

And that's the point, node_modules isn't that bad when compared to other systems. Most memes were created when npm made no efforts to deduplicate dependencies, and at that time node_modules WAS a black hole. You could exceed 10 or 20 directory levels very easily, and 100k files wasn't at all uncommon. But npm got better, first we got npm dedup, and finally it started deduping by default.

Yet, node has a key advantage over other languages' dependency systems, that you don't mention. You compare node with Java or C#, but it's not at all fair, because Java, C#, Python, Ruby and other languages use global or per-project dependencies. Node, instead, has package-local dependencies. This means that, in a node project, each package can declare a dependency against a different version of the same package. You cannot do that on most other languages, and I've had unsolvable situations where two libraries I needed wanted different versions of some third party lib. Node doesn't do that.

So, yeah, node could possibly do better, but right now, in 2018 almost 2019, it is NOT what the memes will make you believe it is.

Also, if you're syncing node_modules between hard drives, you're doing it wrong. It's the whole point of dependency declarations.

Collapse
 
leoat12 profile image
Leonardo Teteo

The memes were just to lighten the mood, I wrote the text based on personal experiences and struggles with node_module in 2018, so it really need to improve, indeed. I've never came across the situation where libraries really needed different versions in Java, the language I have most experience, since developers generally take care to make new versions with backward compatibility. Using well constructed libraries you can use the latest version and don't worry about other libraries dependent of that one that may eventually need an old feature. If you experienced the situation of conflict in Java the library you were using was probably not well maintained. Java is full object-oriented and fully takes advantage of its features to build a healthy dependency management system.

Collapse
 
danielescoz profile image
Daniel Escoz

The dependency problem has zero to do with OO and all to do with module isolation. In Java, your dependencies are simply "I have this package accessible and can import it". If foo.bar.Baz is on your path, you can import it, that's it. But where you import it from makes no difference, and so you can't have foo.bar v1 and v2 in the same project unless they took care themselves of using different package names.

Collapse
 
rivanmota profile image
Rivan

Alt Node

Collapse
 
itsjzt profile image
Saurabh Sharma

😂😂😂

Collapse
 
qm3ster profile image
Mihail Malo • Edited

PNPM does exactly this, with a global repository for the whole machine.
But regardless of whether you install with PNPM, Yarn, or even NPM, you shouldn't carry node_modules around with you, just exclude them when copying or just keep your stuff in git so that you can clone without node_modules and compiled/derived artefacts.

Collapse
 
rhymes profile image
rhymes

JavaScript needs a standard library

Collapse
 
mroggy85 profile image
Oskar Okuno

Why? What is wrong with the new ES6/7 standard?
I am interested in what you are missing.

Collapse
 
rhymes profile image
rhymes

ES6 is a language, a standard library is a little bit more than that.

The absurdity of the amount of third party packages, utilities, snippets, slightly different implementations and so on that you find on npm might be a result of the lack of a common standard library.

That's what I meant.

Thread Thread
 
davesanders profile image
Dave Sanders

This was talked about in an article on Medium I read recently: hackernoon.com/whats-really-wrong-...

Thread Thread
 
rhymes profile image
rhymes

Thanks, I've read it. Well, I think everyone would agree on the practicality of a common and official standard library. The hard thing would be to find who's going to take lead on that. You need at least the support of the TC39 committee, the big browser vendors and so on. The issue is that most successful open-source projects have either a BDFL or a cohesive core team, usually there since the beginning, not 20 years after the creation of the language. I hope time and possibly some fed up big players in the community will result in a real conversation being started.

Nobody forbids you to write your own package "is true", it just probably shouldn't be included automatically by hundreds of packages 😅

Collapse
 
wintercounter profile image
Victor Vincent • Edited

It's coming. Chrome already has the first stdLib built-in. More are about to come.

Collapse
 
dermoench42 profile image
Ervin Peters

We already know for decades that hiding complexity from the programming user makes systems more complex. Especially in node, where everyone tells you to depend on explicit versions. npm handles version dependencies, nuget doesn't. Remember the dll hell in Windows? Same symptom.
Maybe we should ask ourself if it wouldn't be better to focus on API Versions and limit them to 2. That means as long as there is an old version needed, a new one can't be established. Additionally there might be a notification: "a" upgrades the API from Module "x". Your project depends on "x", we recommend to upgrade this dependency. As long as it isn't upgraded, the API of Module "x" can't be developed further.

Remember the developer terms 'responsibility' and 'KISS'?

Collapse
 
gautam4537 profile image
gautam4537

Biggest data ever I copied, causing late reaching to party, thought would take couple of seconds to copy to pendrive at EOD... Uuuufffffff... And could not explained why I was late..😀😀😀

Collapse
 
mittalyashu profile image
Yashu Mittal

You are right, managing node_modules folder is a biggest challenge for JavaScript developer.

And it doesn't matter which tool we use to install the dependency (Yarn or NPM).

Ultimately, the problem is that storage space.

Collapse
 
zkochan profile image
Zoltan Kochan • Edited

pnpm was created to solve this issue and is actively maintained since 2016.

This is how pnpm solves the issue:

  • one version of a package is only ever saved once on a disk in a global store
  • dependencies are imported to node_modules using hard links. So physically files of the package are the same in the global store and in every node_modules
  • symlinks are created inside node_modules to create a nested structure (more about it here

pnpm's solution is battle tested and does not hook into Node's module resolution algorithm, so it is backward compatible.

Indeed, there are new concepts like Yarn Plug'n'Play and Tink. However, they hook into Node's resolution algorithm. They might change the way we use JavaScript but that will be a long process. pnpm works now.