DEV Community

loading...
Cover image for Cleanup the node_modules for a lighter Lambda Function

Cleanup the node_modules for a lighter Lambda Function

solegaonkar profile image Vikas Solegaonkar Originally published at blog.thewiz.net ・5 min read

Any nodejs project carries a bulky folder - the node_modules - that carries all the modules and dependencies that the application would need. If you try to peep into that folder, you can see a huge chunk of folders and files. That often makes me wonder - are these really required? Does my application use so much?

Not just that, each of these modules come with several versions of the code - the dist, prod, and the elaborate bulky src folder. Along with that, it has a ton of readme files and license agreements. Few of them also have a photograph of the developers!
With due regards to each of these, I feel these are not required on my production deployment. That is a big waste of disk space.

People who deploy on a bare server or an EC2 instance, may not mind all of this. Not because the cost and compute are free, but they have already resigned to overprovisioning. So such problems may be a low priority.

But, for someone who is conscious and goes for Lambda functions, it may be a big concern - where each millisecond of compute time is valuable, and so is the memory used.

One may get generous about provisioning RAM, but the deployment package has to restrict to 500MB. An ugly node_modules folder can easily grow well beyond that - and put us in trouble. Also, larger deployment size means longer warmup times. So we should do everything to ensure a compact node_modules folder to get a cleaner deployments.

Here are some of the techniques that helped me.

Check the Dependencies

First of all, we have to overcome the shock - why is my node_modules so huge?

{
  "name": "layerjs",
  "version": "1.0.0",
  "description": "Lambda function triggered by event, to generate daily reports",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "aws-sdk": "^2.805.0",
    "jsonwebtoken": "^8.5.1",
    "pdfkit": "^0.11.0",
    "uuid4": "^2.0.2",
    "xlsx": "^0.16.9"
  }
}
Enter fullscreen mode Exit fullscreen mode

Consider for example, this simple and small package.json. It pulls in a node_modules folder of 117 MB!

$  sudo du -sh node_modules
117M    node_modules
Enter fullscreen mode Exit fullscreen mode

I need to know what is going on here. What does it pull in?

I found a very good tool for this. NPM Graph. Very simple to use, it provides a graphical view of all that goes into the node_modules. Just drop the package.json in there and it will show all that goes into the node_modules

layerjs_dependencies (1).jpg

That's HUGE! Let's try to reduce it now.

AWS SDK modules

This is a very common mistake. A lot of developers - who want to test stuff locally, include the AWS SDK in the package.json. This is great. But, problem starts when we have this pushed into our deployment package.

The Lambda runtime environment carries its own AWS SDK. Unless you have to make a lot of tweaks in there an need a highly customized version, this is really not required in your deployment package. This can be simply achieved by making it a dev-dependency

$ npm install PACKAGE --save-dev
Enter fullscreen mode Exit fullscreen mode

This will make the package a dev dependency. We can use it for development and testing. But it is purned off when we make a production deployment

We can do the same about many other modules that we need only in our development environment.

Production Flag

This follows from the previous one. It is the simplest and yet ignored one. Just delete the node_modules folder and install it again using the --production flag

Any package that we have marked as dev dependencies will not be a part of the deployment. Not just that, any dev-dependency of the our prod dependencies will also drop off.

With this, the package.json becomes

{
  "name": "layerjs",
  "version": "1.0.0",
  "description": "This is the lambda layer generated for the service",
  "main": "index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "keywords": [],
  "author": "",
  "license": "ISC",
  "dependencies": {
    "jsonwebtoken": "^8.5.1",
    "pdfkit": "^0.11.0",
    "uuid4": "^2.0.2",
    "xlsx": "^0.16.9"
  },
  "devDependencies": {
    "aws-sdk": "^2.805.0"
  }
}
Enter fullscreen mode Exit fullscreen mode

Now, we install it with the production flag

$ rm -rf node_modules
$ npm install --production
Enter fullscreen mode Exit fullscreen mode
$ sudo du -sh node_modules
59M     node_modules
Enter fullscreen mode Exit fullscreen mode

Now, the node_modules folder is 40 MB. Note that this chunk is mainly because of the SWS SDK. If everyone had followed the good coding practices, this would have made a huge difference. But... So we may not see miracles here, but it can reduce the deployment size to some extent.

Remove Unnecessary Files

Now that we have dropped the unnecessary packages, we have to start with cleaning the packages themselves.
For that, we have some good utilities.

Node Prune

$ npm install -g node-prune
Enter fullscreen mode Exit fullscreen mode

When we run this in the root folder of the project, it will again tear off what is not useful.

$ node-prune
Before: 59M .
Files: 5696
After: 47M .
Files: 4115
Enter fullscreen mode Exit fullscreen mode

That was good. But it could be better. Let's top it up with other utilities.

ModClean

npm install modclean -g
Enter fullscreen mode Exit fullscreen mode

Then, use it to cleanup the node_modules


$ modclean -n default:safe,default:caution -r


MODCLEAN  Version 3.0.0-beta.1

✔ Found 689 files to remove
[==============================] 100% (689/689) 0.0s

✔ Found 546 empty directories to remove
[==============================] 100% (546/546) 0.0s


FILES/FOLDERS DELETED
    Total:    1235
    Skipped:  0
    Empty:    546


$
Enter fullscreen mode Exit fullscreen mode

It did some work. Now, the size is 43MB

$ sudo du -sh node_modules
43M     node_modules
Enter fullscreen mode Exit fullscreen mode

Uglify Code

We have come down from 98MB to 43MB. That is good, but not as much as one would want. Considering the amount of junk in the node_modules folder, we need something better. And white space is what occupies most space. So we work on that. Uglifying code certainly reduces the file size.

There are several node modules that can help you uglify code. But a lot of them are not compatible with the ES2015 and above. Uglify ES is a good one. Let's start with installing that

npm install uglify-es -g
Enter fullscreen mode Exit fullscreen mode

With this in, let's uglify each JavaScript file in the node_modules folder.

find node_modules -name *.js | while read a
> do
> echo $a
> uglifyjs $a -o $a
> done
Enter fullscreen mode Exit fullscreen mode

This takes a long time, as it has to access and analyze each JS file in there.

At times, this generates a heap overflow error. Because uglifyjs is asynchronous, running in a loop spawn too many of them - causing trouble. Adding a sleep 1 in the loop can solve the problem. But it will increase the runtime further. In any case, it is worth all the effort.

$ sudo du -sh node_modules
37M     node_modules
Enter fullscreen mode Exit fullscreen mode

There, now we have 37MB. That is good! Reduces my warmup time and

Discussion (20)

pic
Editor guide
Collapse
zachlitz profile image
Zach

You raise some good points here.
But surely you would be 100% better off simply using Webpack or some other bundler/packager.
Everything you are doing in this post gets undone on npm update/install, and is not very efficient. Modern tools will do a much better job and also things lime dead code removal and tree shaking.
Webpack is a good start because it has so many plugins and is easy to add ne functionality (though it does have a bit of a learning curve).

For all the people out there, this post gives good thing to think about, but there are much cleaner, more robust, and more efficient ways.
And yes, Weback will handle pretty much any workflow and file/dependency type. But even if you're doing something very unusual you can still use other tools along side Webpack.

Collapse
brianleroux profile image
xnoɹǝʃ uɐıɹq

Webpack is a great solution until you need to debug async code in a running Lambda. No line numbers on stack traces. No idea what failed or where it is. Now you have to redeploy just to debug or play the commenting/console.log dart throwing game. We found in arc.codes that sub 5mb Lambdas coldstart sub second. The solution is single responsibility lambdas with minimal deps as the author suggests. (Using a framework like arc.codes also helps because it encourages small lambdas and bakea in everything you need.)

Collapse
danmactough_66 profile image
Dan MacTough

It's possible to add source map support to your lambdas. I've started a write-up that may be helpful (or may not be -- it's pretty bare-bones at the moment): dev.to/danmactough_66/add-source-m...

But we're using that setup in production and getting stack traces with correct file names and line/column numbers.

Collapse
zachlitz profile image
Zach • Edited

Not quite. Source maps exist for a reason.
You can debug exact source even when using a complicated package/transpile chain. You just use source maps. That's pretty standard.
Anybody who tries to debug built/compiled/packages/transpiled code is in for a world of hurt. But source maps let you debug THE EXACT source.

EDITED: also, you may need to set a Node env variable to enable source maps on Lambda. But there are plenty of instructions that a search will turn up.

Thread Thread
brianleroux profile image
xnoɹǝʃ uɐıɹq

Lambda does not support source maps.

Thread Thread
zachlitz profile image
Zach

Are you debugging a production setup? Because you can definitely setup an environment to debug using source maps. I program in TypeScript, I debug in TypeScript (using source maps). My typical build includes packaging, tree shaking, and minifying. There is no way to debug that without maps.
And in any case, you can get source mapped stack traces regardless of the environment by using the sourcemap-support package. It's very handy but less relevant these days (as source maps are supported pretty much every where now.)

Collapse
foresthoffman profile image
Forest Hoffman • Edited

I love to see posts on optimization and organizing dependencies, however the cover image isn't really relevant. It's a bit insensitive to the model from whichever photo this has been photoshopped (it's a good idea to provide attribution for the original photo, even if it's under a Creative Commons License).

Please review the Community Code of Conduct, and adjust the cover image accordingly.

Cheers.

Collapse
solegaonkar profile image
Vikas Solegaonkar Author • Edited

Thanks for pointing out. I have changed the image.

Collapse
taufik_nurrohman profile image
Taufik Nurrohman

Wait until you do npm update.

Collapse
solegaonkar profile image
Vikas Solegaonkar Author

Haha. Better deploy before that

Collapse
njitmann profile image
njitman • Edited

Lambda runtimes come with more than aws-sdk included, so there may be additional packages you can move to dev dependencies and enjoy even more gains without the need to minify, etc. See the following project that you can run in your AWS account to get a list of the included packages.

github.com/alestic/lambdash

Better instructions here:
alestic.com/2015/06/aws-lambda-she...

For example, in the node.js 10.x runtime, the following packages were included:
assert
async_hooks
awslambda
aws-sdk
base64-js
buffer
child_process
cluster
console
constants
crypto
dgram
dns
domain
dynamodb-doc
events
fs
http
http2
https
ieee754
imagemagick
inspector
isarray
jmespath
lodash
module
net
os
path
perf_hooks
process
punycode
querystring
readline
repl
sax
stream
string_decoder
timers
tls
trace_events
tty
url
util
uuid
v8
vm
xml2js
xmlbuilder
zlib

Collapse
solegaonkar profile image
Vikas Solegaonkar Author

Thanks for sharing

Collapse
jollytoad profile image
Mark Gibson

I've had success recently using esbuild to bundle my lambda's. Previously I've tried rollup and webpack without much luck.
esbuild is both very fast and has virtually zero config, and supports TS out of the box.

Collapse
solegaonkar profile image
Vikas Solegaonkar Author

Thanks for sharing

Collapse
lal12 profile image
Luca Adrian L

I agree to some commentors here, that bundling is the better choice. I have a project where I deploy all libs as a bundle but keep my own source as separate files. This helps for easier debugging on the fly. Though this probably is not necessary/useful when deploying a container or lambda, where a simple bundle and a source map (stored somewhere else) is the better easier choice.

Collapse
abhishekshetty profile image
abhishekshetty

Won't it be better to use Webpack additionally. It minifies and does treeshaking and will bring down the size to a much small size.

Collapse
solegaonkar profile image
Vikas Solegaonkar Author

Thanks for your input. I found webpack messed it up when there was non js dependency. Did you see this problem? How did you overcome that?

Collapse
lal12 profile image
Luca Adrian L

It is true that this can be a bit tricky, since there isn't a universal way for every kind and type of non js dependency. But there is one way or another for every one. For most cases e.g. there is a generic or specific webpack loader. And I personally rarely experienced such thing. Mostly regarding native modules for which there is a webpack loader, but might require special handling for Lambda one way or another.

Collapse
rolfstreefkerk profile image
Rolf Streefkerk

Is there a benchmark that shows difference between layer package size versus lambda deployment size with dependencies on cold starts

Collapse
josiasaurel profile image
Josias Aurel

Thanks for this useful article.