Natalia Vayngolts for Otomato

Posted on Jul 5, 2022

How to optimize production Docker images running Node.js with Yarn

#node #yarn #docker

Usually, Node.js projects contain lots of dependencies. When the project is built, a huge amount of redundant files appear. It may be critical if the application is managed as a Docker image.

Most of the files are excessive and unnecessary for application work, they just take up extra space. For instance, cached data or dev dependencies are always bigger due to modules required during the development workflow.

Sometimes the size of the inessential data reaches up to hundreds of megabytes, so it becomes hard to run Docker images. The bigger image is the more storage it utilizes. Also, the performance of the build and deployment may lack efficiency.



"@nestjs/cli": "^8.2.4",
"@nestjs/common": "^8.4.4",
"@nestjs/core": "^8.4.4",
"@nestjs/jwt": "^8.0.0",
"@nestjs/passport": "^8.2.1",
"@nestjs/platform-express": "^8.4.4",
"@nestjs/serve-static": "^2.2.2",
"@nestjs/swagger": "^5.2.0",
"@nestjs/typeorm": "^8.0.3",
"@sentry/node": "^7.0.0",
"@types/cookie-parser": "^1.4.3",
"bcryptjs": "^2.4.3",
"body-parser": "^1.19.2",
"bull": "^4.7.0",
"class-transformer": "^0.5.1",
"class-validator": "^0.13.2",
"cookie-parser": "^1.4.6",
"cross-env": "^7.0.3",
"dayjs": "^1.11.3",
"dotenv": "^16.0.0",
"express-basic-auth": "^1.2.1",
"flagsmith-nodejs": "^1.1.1",
"jsonwebtoken": "^8.5.1",
"passport": "^0.5.2",
"passport-apple": "^2.0.1",
"passport-facebook": "^3.0.0",
"passport-google-oauth20": "^2.0.0",
"passport-http": "^0.3.0",
"passport-jwt": "^4.0.0",
"passport-local": "^1.0.0",
"pg": "^8.7.3",
"pg-connection-string": "^2.5.0",
"redis": "^4.0.4",
"reflect-metadata": "^0.1.13",
"rimraf": "^3.0.2",
"rxjs": "^7.2.0",
"swagger-ui-express": "^4.3.0",
"typeorm": "0.2",
"uuid": "^8.3.2"

The example_1 represents an unrefined Docker image. Its size on disk is about 1 GB.

And it takes about 900 MB to upload to a registry.

Contents of the Dockerfile:



FROM node:16.15-alpine
USER node
RUN mkdir -p /home/node/app
WORKDIR /home/node/app
COPY --chown=node . .
RUN yarn install
CMD ["yarn", "start"]

Let’s run the image and check what’s inside the container:



docker run -it --rm example_1 sh

After executing the shell it’s possible to get into the home directory and find out the actual subdirectories size:



~ $ du -d 1 -h
8.0K    ./.yarn
594.3M  ./app
560.9M  ./.cache
1.1G    .

According to the Yarn website,

Yarn stores every package in a global cache in your user directory on the file system.

As one can see, the .cache directory holds backed up packages for offline access and takes about 560 MB. On closer inspection, it’s obvious the folders contain sources of npm dependencies:

ls -la command shows there are 970 entities in common:



~/.cache/yarn/v6 $ ls -la | wc -l
970

A dependency directory may contain something like this:

It’s possible to perform yarn cache clean command to clean the cache folder.

Slight changes in RUN instruction in the Dockerfile



FROM node:16.15-alpine
USER node
RUN mkdir -p /home/node/app
WORKDIR /home/node/app
COPY --chown=node . .
RUN yarn install && yarn cache clean
CMD ["yarn", "start"]

lead to significant changes in the image (example_2) size:

As can be seen, the .cache folder is almost empty:



~ $ du -d 1 -h
8.0K    ./.yarn
594.3M  ./app
12.0K   ./.cache
594.3M  .

There’s a way to make the image even smaller. It’s required to install production Node.js dependencies to avoid dev modules that are designated in the development and testing process only. Adding --production flag to the yarn install command:



FROM node:16.15-alpine
USER node
RUN mkdir -p /home/node/app
WORKDIR /home/node/app
COPY --chown=node . .
RUN yarn install --production && yarn cache clean
CMD ["yarn", "start"]

So the image example_3 is more than two times smaller than the original example_1.

The app folder with production dependencies installed takes 469 MB instead of 594 MB now.



~ $ du -d 1 -h
8.0K    ./.yarn
469.0M  ./app
12.0K   ./.cache
469.1M  .

Another option is to make a multi-stage build and copy solely required artifacts from the image where the build was made.



FROM node:16.15-alpine AS builder

USER node

RUN mkdir -p /home/node/app

WORKDIR /home/node/app

COPY --chown=node . .
# Building the production-ready application code - alias to 'nest build'
RUN yarn install --production && yarn build

FROM node:16.15-alpine

USER node

WORKDIR /home/node/app

COPY --from=builder --chown=node /home/node/app/node_modules ./node_modules
# Copying the production-ready application code, so it's one of few required artifacts
COPY --from=builder --chown=node /home/node/app/dist ./dist
COPY --from=builder --chown=node /home/node/app/public ./public
COPY --from=builder --chown=node /home/node/app/package.json .

CMD [ "yarn", "start" ]

NestJS is used to build the efficient and scalable Node.js application with Typescript.

The example_4 image has almost the same size as the example_3 one:

And finally, it takes about 350 MB only to upload to a registry:

Thus, the image size is reduced more than twice from 1 GB to 460 MB. It takes less storage and time to deploy the application.

Top comments (3)

Mark Davydov • Jul 6 '22 • Edited

Great article for those who are trying to make their image slimmer :)
I will add couple of things:

RUN mkdir -p /home/node/app , you don't really need it , WORKDIR creates the dir if doesn't exists. (thats extra layer)
you don't want to yarn install, it will update all your packages to latest version your symantic version permits, so you don't really know what goes on there also it does updates your yarn.lock file and you don't maintain it for nothing . Usually it is better to use yarn install --frozen-lockfile .
Also at newer version of yarn yarn install --immutable --immutable-cache --check-cache , used as explained here: yarnpkg.com/cli/install
probably you don't want to use yarn start at your production containers, it can mess the SIGTERM and SIGKILL signals , kubernetes or docker swarm (or any other orchestration tool) will send to your container.
For more info read here : snyk.io/blog/10-best-practices-to-... , number 5.

also I would suggest using
github.com/wagoodman/dive tool , to dive into your layers and understand where are the big MBs come from.
also slim.ai/ can help you with that