Any nodejs project carries a bulky folder - the node_modules - that carries all the modules and dependencies that the application would need. If you try to peep into that folder, you can see a huge chunk of folders and files. That often makes me wonder - are these really required? Does my application use so much?
Not just that, each of these modules come with several versions of the code - the dist, prod, and the elaborate bulky src folder. Along with that, it has a ton of readme files and license agreements. Few of them also have a photograph of the developers!
With due regards to each of these, I feel these are not required on my production deployment. That is a big waste of disk space.
People who deploy on a bare server or an EC2 instance, may not mind all of this. Not because the cost and compute are free, but they have already resigned to overprovisioning. So such problems may be a low priority.
But, for someone who is conscious and goes for Lambda functions, it may be a big concern - where each millisecond of compute time is valuable, and so is the memory used.
One may get generous about provisioning RAM, but the deployment package has to restrict to 500MB. An ugly node_modules folder can easily grow well beyond that - and put us in trouble. Also, larger deployment size means longer warmup times. So we should do everything to ensure a compact node_modules folder to get a cleaner deployments.
Here are some of the techniques that helped me.
Check the Dependencies
First of all, we have to overcome the shock - why is my node_modules so huge?
{
"name": "layerjs",
"version": "1.0.0",
"description": "Lambda function triggered by event, to generate daily reports",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC",
"dependencies": {
"aws-sdk": "^2.805.0",
"jsonwebtoken": "^8.5.1",
"pdfkit": "^0.11.0",
"uuid4": "^2.0.2",
"xlsx": "^0.16.9"
}
}
Consider for example, this simple and small package.json. It pulls in a node_modules folder of 117 MB!
$ sudo du -sh node_modules
117M node_modules
I need to know what is going on here. What does it pull in?
I found a very good tool for this. NPM Graph. Very simple to use, it provides a graphical view of all that goes into the node_modules. Just drop the package.json in there and it will show all that goes into the node_modules
That's HUGE! Let's try to reduce it now.
AWS SDK modules
This is a very common mistake. A lot of developers - who want to test stuff locally, include the AWS SDK in the package.json. This is great. But, problem starts when we have this pushed into our deployment package.
The Lambda runtime environment carries its own AWS SDK. Unless you have to make a lot of tweaks in there an need a highly customized version, this is really not required in your deployment package. This can be simply achieved by making it a dev-dependency
$ npm install PACKAGE --save-dev
This will make the package a dev dependency. We can use it for development and testing. But it is purned off when we make a production deployment
We can do the same about many other modules that we need only in our development environment.
Production Flag
This follows from the previous one. It is the simplest and yet ignored one. Just delete the node_modules folder and install it again using the --production flag
Any package that we have marked as dev dependencies will not be a part of the deployment. Not just that, any dev-dependency of the our prod dependencies will also drop off.
With this, the package.json becomes
{
"name": "layerjs",
"version": "1.0.0",
"description": "This is the lambda layer generated for the service",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"keywords": [],
"author": "",
"license": "ISC",
"dependencies": {
"jsonwebtoken": "^8.5.1",
"pdfkit": "^0.11.0",
"uuid4": "^2.0.2",
"xlsx": "^0.16.9"
},
"devDependencies": {
"aws-sdk": "^2.805.0"
}
}
Now, we install it with the production flag
$ rm -rf node_modules
$ npm install --production
$ sudo du -sh node_modules
59M node_modules
Now, the node_modules folder is 40 MB. Note that this chunk is mainly because of the SWS SDK. If everyone had followed the good coding practices, this would have made a huge difference. But... So we may not see miracles here, but it can reduce the deployment size to some extent.
Remove Unnecessary Files
Now that we have dropped the unnecessary packages, we have to start with cleaning the packages themselves.
For that, we have some good utilities.
Node Prune
$ npm install -g node-prune
When we run this in the root folder of the project, it will again tear off what is not useful.
$ node-prune
Before: 59M .
Files: 5696
After: 47M .
Files: 4115
That was good. But it could be better. Let's top it up with other utilities.
ModClean
npm install modclean -g
Then, use it to cleanup the node_modules
$ modclean -n default:safe,default:caution -r
MODCLEAN Version 3.0.0-beta.1
β Found 689 files to remove
[==============================] 100% (689/689) 0.0s
β Found 546 empty directories to remove
[==============================] 100% (546/546) 0.0s
FILES/FOLDERS DELETED
Total: 1235
Skipped: 0
Empty: 546
$
It did some work. Now, the size is 43MB
$ sudo du -sh node_modules
43M node_modules
Uglify Code
We have come down from 98MB to 43MB. That is good, but not as much as one would want. Considering the amount of junk in the node_modules folder, we need something better. And white space is what occupies most space. So we work on that. Uglifying code certainly reduces the file size.
There are several node modules that can help you uglify code. But a lot of them are not compatible with the ES2015 and above. Uglify ES is a good one. Let's start with installing that
npm install uglify-es -g
With this in, let's uglify each JavaScript file in the node_modules folder.
find node_modules -name *.js | while read a
> do
> echo $a
> uglifyjs $a -o $a
> done
This takes a long time, as it has to access and analyze each JS file in there.
At times, this generates a heap overflow error. Because uglifyjs is asynchronous, running in a loop spawn too many of them - causing trouble. Adding a sleep 1 in the loop can solve the problem. But it will increase the runtime further. In any case, it is worth all the effort.
$ sudo du -sh node_modules
37M node_modules
There, now we have 37MB. That is good! Reduces my warmup time and
Top comments (20)
You raise some good points here.
But surely you would be 100% better off simply using Webpack or some other bundler/packager.
Everything you are doing in this post gets undone on npm update/install, and is not very efficient. Modern tools will do a much better job and also things lime dead code removal and tree shaking.
Webpack is a good start because it has so many plugins and is easy to add ne functionality (though it does have a bit of a learning curve).
For all the people out there, this post gives good thing to think about, but there are much cleaner, more robust, and more efficient ways.
And yes, Weback will handle pretty much any workflow and file/dependency type. But even if you're doing something very unusual you can still use other tools along side Webpack.
Webpack is a great solution until you need to debug async code in a running Lambda. No line numbers on stack traces. No idea what failed or where it is. Now you have to redeploy just to debug or play the commenting/console.log dart throwing game. We found in arc.codes that sub 5mb Lambdas coldstart sub second. The solution is single responsibility lambdas with minimal deps as the author suggests. (Using a framework like arc.codes also helps because it encourages small lambdas and bakea in everything you need.)
It's possible to add source map support to your lambdas. I've started a write-up that may be helpful (or may not be -- it's pretty bare-bones at the moment): dev.to/danmactough_66/add-source-m...
But we're using that setup in production and getting stack traces with correct file names and line/column numbers.
Not quite. Source maps exist for a reason.
You can debug exact source even when using a complicated package/transpile chain. You just use source maps. That's pretty standard.
Anybody who tries to debug built/compiled/packages/transpiled code is in for a world of hurt. But source maps let you debug THE EXACT source.
EDITED: also, you may need to set a Node env variable to enable source maps on Lambda. But there are plenty of instructions that a search will turn up.
Lambda does not support source maps.
Are you debugging a production setup? Because you can definitely setup an environment to debug using source maps. I program in TypeScript, I debug in TypeScript (using source maps). My typical build includes packaging, tree shaking, and minifying. There is no way to debug that without maps.
And in any case, you can get source mapped stack traces regardless of the environment by using the sourcemap-support package. It's very handy but less relevant these days (as source maps are supported pretty much every where now.)
I love to see posts on optimization and organizing dependencies, however the cover image isn't really relevant. It's a bit insensitive to the model from whichever photo this has been photoshopped (it's a good idea to provide attribution for the original photo, even if it's under a Creative Commons License).
Please review the Community Code of Conduct, and adjust the cover image accordingly.
Cheers.
Thanks for pointing out. I have changed the image.
Wait until you do
npm update
.Haha. Better deploy before that
I've had success recently using esbuild to bundle my lambda's. Previously I've tried rollup and webpack without much luck.
esbuild is both very fast and has virtually zero config, and supports TS out of the box.
Thanks for sharing
Lambda runtimes come with more than aws-sdk included, so there may be additional packages you can move to dev dependencies and enjoy even more gains without the need to minify, etc. See the following project that you can run in your AWS account to get a list of the included packages.
github.com/alestic/lambdash
Better instructions here:
alestic.com/2015/06/aws-lambda-she...
For example, in the node.js 10.x runtime, the following packages were included:
assert
async_hooks
awslambda
aws-sdk
base64-js
buffer
child_process
cluster
console
constants
crypto
dgram
dns
domain
dynamodb-doc
events
fs
http
http2
https
ieee754
imagemagick
inspector
isarray
jmespath
lodash
module
net
os
path
perf_hooks
process
punycode
querystring
readline
repl
sax
stream
string_decoder
timers
tls
trace_events
tty
url
util
uuid
v8
vm
xml2js
xmlbuilder
zlib
Thanks for sharing
I agree to some commentors here, that bundling is the better choice. I have a project where I deploy all libs as a bundle but keep my own source as separate files. This helps for easier debugging on the fly. Though this probably is not necessary/useful when deploying a container or lambda, where a simple bundle and a source map (stored somewhere else) is the better easier choice.
Won't it be better to use Webpack additionally. It minifies and does treeshaking and will bring down the size to a much small size.
Thanks for your input. I found webpack messed it up when there was non js dependency. Did you see this problem? How did you overcome that?
It is true that this can be a bit tricky, since there isn't a universal way for every kind and type of non js dependency. But there is one way or another for every one. For most cases e.g. there is a generic or specific webpack loader. And I personally rarely experienced such thing. Mostly regarding native modules for which there is a webpack loader, but might require special handling for Lambda one way or another.
Thanks for this useful article.
Is there a benchmark that shows difference between layer package size versus lambda deployment size with dependencies on cold starts