I think I'm not the first one to talk about this problem even here in dev.to. I quick research trying to found any solution concluded with the image that is the head of this text. The node_modules folder is where your project dependencies are stored, common knowledge. Its weight is also common knowledge.
Why I decided to vent my frustration now
Black Friday is here! It means discounts and the opportunity to update your computer. Therefore I decided to buy a SSD to boost the performance of my laptop, from 1 TB HDD to 500 GB SSD. All my files right now sums 299 GB, so I will not lose much space, but I decided to do the housekeeping work even so, this includes making backups of my projects. Not all projects I make I put on GitHub, sometimes I'm just experimenting and it is not worth the trouble, but I keep them anyway.
When I started the copy & paste process I remembered how heavy node_modules are...
Some comparisons
One example that shows clearly the problem is the node_modules folder of my ToRead CLI project as you can see in the imagem below.
The size of the folder is not really the problem although I will get to that later, but 15.000 files and more than 1800 folders!? Are you kidding me?! It is a simple CLI project with 5 files! Just for a comparison, let's see how many files and folders there is in the Windows folder:
While the system was counting I really thought node_modules would win this, but no. In any case, the folder has almost half of the amount of files an entire operating system has!
As I've said, the problem when copying node_modules folder from one place to another is not the size, it is the amount of files and folders, the complexity of the tree. It is a nightmare for a HDD. It takes many minutes to discover all files let alone copy them. In the end, it also impacts npm performance and there are memes for that also.
Other comparisons come from my passion for serverless. It is not rare for me to implement the same function in both Java and Javascript and since you have to bundle the function along with its dependencies it is a good way to compare which one is more efficient in dependency management. In one of my projects I wtote the function in both languages with virtually the same features and Java bundle size is 11.1 MB and NodeJS bundle size was 29.0 MB. Therefore, NodeJS can do a better job at the size of dependencies as well.
What other languages do
Besides NodeJS I have experience dealing with dependencies in two more languages: Java and C#. They have, in my opinion, a very similar way of handling dependencies and a much more efficient way than NodeJS.
Java has Maven, Gradle and other dependency management applications that works basically the same. There is a remote repository of the dependencies, generally Maven Central and a local repository. Maven always looks for the dependency in the local repository first and if not found it downloads from the remote repository. The dependencies are not within the project, like node_modules folder, it is more global, it is downloaded once and can be used by many projects, just add to your pom.xml.
C# follows the same idea, you list your dependencies in a .csproj file and Nuget takes care of the dependencies having also a remote and a local repository. It is much more efficient to handle dependencies this way, download once use in any project locally.
I think there is also a difference in culture and the way the languages were structured and what people see as libraries. Java has a very mature core of libraries that can deal with almost anything, common scenarios or not. Therefore, libraries in Java generally are meant to be an abstraction of what Java already has, making it easier to use. Therefore, the libraries have a more shallow dependency tree, reaching the Java core libraries much quicker.
What I see in NodeJS on the other hand is the opposite, everything can become a library, even a library that sums two numbers (hypothetical example, I hope) and libraries are heavily dependent of one another generating deep dependency trees, many files and folders.
Conclusion & Discussion
I am certainly not qualified to criticize NodeJs structure and engineering, but as a user I clearly see a problem and some lessons from other languages that can be used to improve the dependency management, which is paramount nowadays for almost every application. What do you think this problem came to be and what has been done to solve it? It would be very interesting to hear from more experienced developers what you do to remedy this.
Top comments (89)
At the risk of being too snarky this morning:
npmjs.com/package/addition2
Yup. My first thought when I read that line was, "I don't even have to check. I'm sure it is there..."
and it has 4 downloads 😂
Lucky it doesn't have its own dependencies...
But really.. how surprised would have you been? :p
Thanks for making me lose my hope in humanity today. hahahaha
That's the reason why some developers can't have nice things.
Aah damn, I just needed something that could add four numbers. I'll keep looking!
Lol - looks like a good time to submit a pull request!
Anyone want to try out addition3? I have added support for TypeScript/ES6 and some more syntactic sugar for the goodness' sake. I have tried very hard not to make it the same functionality of
lodash.add
so that it can be added as a dependency reasonably :)github.com/sentialx/add-2-numbers too. Less features, but a better algorithm.
LOL
You just uploaded it!
Didn't you?
;)
Well probably not, because this thread is coming up on a year old, now.
I'm using Yarn these days, I have no issues with disk size cap as I got like 2TB SSD raids and it's so freaking fast and cool that it works just right for me, however if you or anybody is willing to try PNPM it seems the way to go at solving the node_modules size issues
Ooh, I hadn’t heard about PNPM and I’m excited to try it and get whole gigabytes back, hahaha
Yeah give pnpm a try, I think you'll like it.
I'm sure going to give it a try. Thanks!
By the way, there is a pnpm tag on dev, so you can follow it dev.to/t/pnpm
Apparently there's also @zkochan on dev.to
A welcome addition 👌😫
I tested on one of my projects and it looks a bit faster indeed, but I don't know if it just that it is in memory right now, so I will test for the upcoming days to see its performance for real.
Why copy it at all? The power of NPM is the you can recreate the exact environment described in your package.json file with one command. Copying the node_modules directory is pointless.
I ended up deleting all node_modules folders before copying it because it would be impossible to copy otherwise, but my first thought was convenience, I see my projects folder with dozens of folders of projects, the first thought is to select everything, zip it and copy to my backup location.
My main point in the text is that there are other languages that deal with it better, providing what you said, recreation of environment without adding a dependency folder in every project with thousands of files and folders.
Backing up your computer or replacing your hard drive is not working against the framework/community/tools. Those are common tasks, and if a particular tool makes them awful for users because of some design decisions, then the users are not to blame and are allowed to complain.
If your code is under source control what is the need for copy them for backup?
Not all of the projects are.
I think that's where your problem lies and then your article is about one of the symptoms when not using source control.
Node_modules is not designed for moving around. With all due respect, but I think you're just using it in a way it was never intended.
I never intended to move them around, it was a natural action to copy & paste all folders in the project folder, after I remembered that I canceled the copy, deleted all of them and started the copy process again. However, the problem still remains, it has too many files and it is heavy to download and install. NPM tends to take a long time here to finish. I think my point remains.
I understand that you come from a different environment Java/C#. This was also my first languages before I learned Javascript through jQuery/React/Nodejs. I had to learn that there are different philosophies in play here.
Most of the modules are made by hobbyists that do it on their free time. There are great ways to minimize libraries, e.g. aviod duplications by using peerDependency in package.json or make sure you strip away unnecessary files during publish. But, you will have to get use to that there will be a lot of small modules (unix style: doing one thing and doing it great).
Instead of working against the framework/community/tools please try to understand it first and then try to come with improvements in the form of PRs, issues and encouraging posts about how to do it right. I don't think your current attitude will help you succeed in becoming a better Nodejs developer. Because the community need more good developers. Thank you for understanding.
I put a bunch of my projects in Dropbox, and it spends TONS of time syncing node_modules which is completely pointless. If anyone knows if there's a way to tell Dropbox to ignore node_modules folders I'm all ears...
Why are u using Dropbox instead of git?
I do use git in certain cases but I don't always need a git repo for every JavaScript project.
I understand. But, I would not recommend you using Dropbox as source control, because you run into the problems as you just mentioned previously.
When using Azure DevOps, like in my case, adding a git repo can be as fast as creating a project and saving it to Dropbox.
PS: Dropbox gets full fast (not paying for it)!
PPS: Use a .gitignore file but I guess you knew that.
@mroggy85 :
Not every project is meant to be on GitHub and a local repository won't fix the problem.
The problem is that you have to delete all the node_modules folders manually in a up to infinity number of project folders and then re-install them after moving.
Maybe you can do that with a little cli magic, but that's not really user friendly and will take it's time...
I understand that it can be a litte inconvenience that one time you move your project. But I would not call it a common use case moving project files around. Then, I would suggest that you review your development process instead.
There are lots of steps between github and a local repo. For example, for small one-off projects I'll start with a local repo and then clone it on my other dev machine (each repo is a remote for the other, so I can easily sync changes back and forth). As it gets bigger, it can be pushed to the gitolite install I have on my media PC (which was just a random machine I had lying around that was always on; a Raspberry Pi would do just as well). Only when I want other people to look at it do I push to Github.
Even if you're just using local repos, git still helps you out; you only need to back up the .git directory; to restore, you put it back in place and run
git checkout HEAD
. No backing up of node_modules necessary!I do it when I archive my projects because there is no guarantee that npm will be there in say 20 years from now.
Great rant, I wholeheartedly agree with everything you said.
I think that the major problem is the carelessness with which library developers pull in dependencies.
I have a pet project with one production dependency and three development dependencies and npm has pulled in a total of, no kidding, 2352 packages.
One of those packages is called
path-is-inside
and is a simple script with fifteen lines of code that checks whether a path is inside another path.Now, I'm not saying that everybody should be reinventing the wheel all the time everywhere, but if developers are pulling in packages instead of writing fifteen-line utility functions (it's not even hard to figure out if a path is inside another path), no wonder that there are so many jokes about
node_modules
out there.Personally, I'm always wary of pulling in dependencies and I always try to avoid packages that seem to have an unreasonable amount of dependencies for the job they do, although I'm more liberal with development dependencies (after all, there's only so much you can avoid when you're stuck with
nodemon
andwebpack
).It's not that it's hard, it's that, between normalization and platform (Windows) specific issues, it's easy to get wrong. Reimplementing it is not worth the 7kb you may save.
I would normally concede that it's a grey zone, but
path-is-inside
doesn't even do path normalisation (which, for the record, you can easily do with node's path module). I'd say that 15 lines of code to save 7kb and a broken implementation is worth every bit of it.Anyway, this was a particular example, you could argue about every single small package that there is a reason you may want to add it as a dependency instead of implementing it yourself and you would be right. However, if we don't draw the line anywhere, well... we end up with 1875 subfolders in our
node_modules
folder. I just wish library developers were more careful when adding dependencies and only add them when they are truly required and have decent quality.In all likelihood, you're not saving 7kb; another library you're using probably has the same requirement and probably uses that same package. And if you get all libraries to do as you do, you now have n identical implementations of a 1kb function as opposed to a constant 7kb.
node_modules
aren't huge because somebody pulls into a small library. They're huge because people'll publish unneeded files, or don't split up their library into more focused libraries, or don't make an effort to deduplicate their dependency trees, or keep support for oldnode
versions, or pin to specific versions so other dependencies can't deduplicate.It doesn't have to be a wish; if it is important to you, it is totally within your power to attempt to rectify that which you find a mistake (at least as far as open source goes). Many repositories will be responsive or even appreciative if you make an issue.
My folder reached a 380mb size & i hadn't even started coding
Words cannot describe how much I love that picture...
Hey, just a heads up since you don't reference it in your post! 😃
Yarn has worked on a feature aiming to remove node_modules folders from the equation. It works, has already shipped, and is used in production in various places. As for npm, they've started working on an experimental project sharing some aspects, called Tink.
For more information, the RFC that introduced Plug'n'Play (the name of this "no node_modules" install strategy) is github.com/yarnpkg/rfcs/pull/101 - there's a lot of great discussion there.
I knew yarn, but I never used it indeed. I will take a look at the PR you linked and yarn in general, it may be better than NPM in this management. Do you think it is better?
I'm part of the Yarn core team, so my opinion is biased ;)
Overall both Yarn and npm are pretty good tools. We tend to ship more features, in part thanks to our community which frequently contributes, but npm isn't that far behind.
Plug'n'Play is currently exclusive to Yarn, though (Tink is quite different, I wrote an article about it not too long ago).
I found your text here, I will read it. :)
Since I wrote this article, I need to be opened for everything so I will test yarn thoroughly. It looks nice, starting by the mascot. :D
So, without reading other comments, let me add my two cents...
I've done a quick check on a typical microservice at work. I got 200MB and 19k files for node_modules. If I do the same with a virtualenv site-packages directory of another project at work, i get 10k files and 550MB.
I have no idea if these are typical sizes, but it's what I have handy. In summary: two projects, both in production, with similar sizes, node wins on files with a x2 factor and python wins on size with a x2.25 factor. That isn't so bad for node, is it?
And that's the point, node_modules isn't that bad when compared to other systems. Most memes were created when npm made no efforts to deduplicate dependencies, and at that time node_modules WAS a black hole. You could exceed 10 or 20 directory levels very easily, and 100k files wasn't at all uncommon. But npm got better, first we got
npm dedup
, and finally it started deduping by default.Yet, node has a key advantage over other languages' dependency systems, that you don't mention. You compare node with Java or C#, but it's not at all fair, because Java, C#, Python, Ruby and other languages use global or per-project dependencies. Node, instead, has package-local dependencies. This means that, in a node project, each package can declare a dependency against a different version of the same package. You cannot do that on most other languages, and I've had unsolvable situations where two libraries I needed wanted different versions of some third party lib. Node doesn't do that.
So, yeah, node could possibly do better, but right now, in 2018 almost 2019, it is NOT what the memes will make you believe it is.
Also, if you're syncing node_modules between hard drives, you're doing it wrong. It's the whole point of dependency declarations.
The memes were just to lighten the mood, I wrote the text based on personal experiences and struggles with node_module in 2018, so it really need to improve, indeed. I've never came across the situation where libraries really needed different versions in Java, the language I have most experience, since developers generally take care to make new versions with backward compatibility. Using well constructed libraries you can use the latest version and don't worry about other libraries dependent of that one that may eventually need an old feature. If you experienced the situation of conflict in Java the library you were using was probably not well maintained. Java is full object-oriented and fully takes advantage of its features to build a healthy dependency management system.
The dependency problem has zero to do with OO and all to do with module isolation. In Java, your dependencies are simply "I have this package accessible and can import it". If
foo.bar.Baz
is on your path, you can import it, that's it. But where you import it from makes no difference, and so you can't havefoo.bar
v1 and v2 in the same project unless they took care themselves of using different package names.😂😂😂
PNPM does exactly this, with a global repository for the whole machine.
But regardless of whether you install with PNPM, Yarn, or even NPM, you shouldn't carry
node_modules
around with you, just exclude them when copying or just keep your stuff in git so that you can clone without node_modules and compiled/derived artefacts.