DEV Community

Internet Explorer
Internet Explorer

Posted on

Dockerfile Optimization using Multistage Builds, Caching, and Lightweight images

In modern software deployment, Docker holds a premier position due to its ability to build, ship, and run applications in isolated environments called containers. A Dockerfile defines these environments, making its optimization crucial for efficient application development and deployment. In this blog post, we'll delve into the details of Dockerfile optimization, focusing particularly on Docker's caching mechanism. We will be illustrating these concepts using a Laravel PHP application with Nginx and Yarn.

Initial Dockerfile Setup

A sample Dockerfile for a Laravel PHP application might look something like this:

FROM php:7.4-fpm

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    libpng-dev \
    libjpeg62-turbo-dev \
    libfreetype6-dev \
    locales \
    zip \
    jpeg62-turbo \
    unzip \
    git \
    curl \
    libzip-dev \
    libonig-dev \
    libxml2-dev

# Clear cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Install PHP extensions
RUN docker-php-ext-install pdo_mysql mbstring exif pcntl gd zip xml

# Install Composer
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer

# Install Node.js and Yarn
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash -
RUN apt-get install -y nodejs
RUN npm install --global yarn

WORKDIR /var/www

# Copy existing application directory contents
COPY . /var/www

# Install PHP and JS dependencies
RUN composer install
RUN yarn install

EXPOSE 9000
CMD ["php-fpm"]
Enter fullscreen mode Exit fullscreen mode

While this Dockerfile gets the job done, it's far from being optimized. Notably, it doesn't make effective use of Docker's caching features, and the final image size is larger than necessary.

Switching to Alpine: Size and Security Matters

One notable change we will make in the Dockerfile is switching our base image from php:7.4-fpm to php:7.4-fpm-alpine. This is an excellent example of how the choice of base image can have a significant impact on the size and security of your Docker images.

Alpine Linux is a security-oriented, lightweight Linux distribution that is based on musl libc and BusyBox. The base Docker image of Alpine is much smaller than most distribution base images (~5MB), making it a top choice for teams keen on reducing the size of their images for security, speed, and efficiency reasons.

For many programming languages, official Docker images include both a full version, based on Debian or Ubuntu, and a version based on Alpine. Here's why the Alpine image is often better:

  1. Image size: Docker images based on Alpine are typically much smaller than those based on other distributions. This means they take up less disk space, use less network bandwidth, and start more quickly.

  2. Security: Alpine uses musl libc and BusyBox to reduce its size, but these tools also have a side benefit of reducing the attack surface of the image. Additionally, Alpine includes proactive security features like PIE and SSP to prevent exploits.

  3. Resource efficiency: Smaller Docker images are faster to deploy, use less RAM, and require fewer CPU resources. This makes them a more cost-effective choice, particularly for scalable, high-availability applications.

By changing to an Alpine image, we're able to achieve a more optimized Dockerfile. This results in a smaller, faster, and more secure Docker image that makes better use of Docker's caching mechanism and overall resource efficiency.

Docker's Caching Mechanism: The Backbone of Optimization

Each Dockerfile instruction creates an image layer, making Docker images a stack of these layers. Docker stores these intermediate images in its cache to accelerate future builds. When building an image, Docker checks if there's a cached layer corresponding to each instruction. If an identical layer exists and the context hasn't changed, Docker uses the cached layer instead of executing the instruction anew. This caching mechanism significantly speeds up image builds.

Harnessing Docker's Caching Mechanism: An Advanced Approach

While Docker's caching mechanism is designed to improve build efficiency, a misunderstanding of its nuances can lead to ineffective caching and slower build times. Docker evaluates each instruction in the Dockerfile in sequence, invalidating the cache for an instruction as soon as it encounters an instruction for which the cache was invalidated.

This characteristic means the order of instructions in your Dockerfile can have a significant impact on build performance. The most frequently changing layers, usually those involving your application code, should be at the bottom of your Dockerfile. Conversely, layers that change infrequently, such as those installing dependencies, should be at the top.

Consider our Laravel application. If we modify any file within our application code, Docker invalidates the cache for the COPY . /var/www line and every subsequent line in our Dockerfile. To avoid unnecessary composer install and yarn install operations, we can restructure our Dockerfile:

FROM php:7.4-fpm-alpine

RUN apk --no-cache add \
    build-base \
    libpng-dev \
    libjpeg-turbo-dev \
    libzip-dev \
    unzip \
    git \
    curl

RUN docker-php-ext-install pdo_mysql mbstring exif pcntl gd zip xml

COPY --from=composer:latest /usr/bin/composer /usr/bin/composer

WORKDIR /var/www

COPY package.json yarn.lock ./
RUN yarn install

COPY . /var/www

RUN composer install

EXPOSE 9000
CMD ["php-fpm"]
Enter fullscreen mode Exit fullscreen mode

Just a little off topic: You can further optimize the downloads using composer

# no auto-loader is option is needed so it does look for some laravel files, just focus it on installing packages.
COPY composer.lock composer.lock
COPY composer.json composer.json
# copy only the composer.json and lock file
RUN composer install --no-dev --no-autoloader
# ...... run  dump-autoload to almost last step after youve copied your code files.
RUN composer dump-autoload --optimize

Enter fullscreen mode Exit fullscreen mode

Kaniko Caching: A New Age of Docker Caching

Kaniko is a tool to build container images from a Dockerfile, inside a container or Kubernetes cluster. One of its greatest strengths is advanced layer caching. Kaniko caching allows the reuse of layers in situations where Docker's caching falls short.

Kaniko can cache both the final image layers and intermediate build artifacts. With this flexibility, you can use Kaniko in CI/CD pipelines where the base image layers don't change frequently, but the application code does.

To use Kaniko's cache, you need to push a cache to a Docker registry. The cache consists of intermediate layers that can be reused in subsequent builds. The following command is an example of how to use the cache:

/kaniko/executor --context dir://path/to/dockerfile --destination your_registry/your_repo:your_tag --cache=true --cache-repo=your_registry/your_repo/cache
Enter fullscreen mode Exit fullscreen mode

In the command above, Kaniko uses --cache=true to enable caching and --cache-repo to specify where to push/pull the cached layers. In a subsequent build, Kaniko pulls the layers from the cache repository and uses them if the layers in the Dockerfile haven’t changed.

Github Pipelines and CI/CD

Docker's caching mechanism can be highly beneficial when integrated into your Continuous Integration/Continuous Delivery (CI/CD) pipelines. It allows your pipelines to reuse the previously built layers from the cache, reducing the build times significantly. Github Actions provide an efficient way to implement such CI/CD pipelines for your Docker builds.

Here's a simple Github Actions workflow file that builds a Docker image using the Docker layer caching:

name: Docker Build, Push, and Deploy

on:
  push:
    branches:
      - master

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - name: Check out the repo
      uses: actions/checkout@v2

    - name: Login to DockerHub
      uses: docker/login-action@v1 
      with:
        username: ${{ secrets.DOCKERHUB_USERNAME }}
        password: ${{ secrets.DOCKERHUB_TOKEN }}

    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v1

    - name: Cache Docker layers
      uses: actions/cache@v2
      with:
        path: /tmp/.buildx-cache
        key: ${{ runner.os }}-buildx-${{ github.sha }}
        restore-keys: |
          ${{ runner.os }}-buildx-

    - name: Build and push Docker image
      uses: docker/build-push-action@v2
      with:
        context: .
        push: true
        tags: your_dockerhub_username/your_repository:your_tag
        cache-from: type=local,src=/tmp/.buildx-cache
        cache-to: type=local,dest=/tmp/.buildx-cache
Enter fullscreen mode Exit fullscreen mode

In the above workflow:

  • The actions/checkout@v2 step checks out your repository.
  • The docker/login-action@v1 step logs in to DockerHub using your credentials.
  • The docker/setup-buildx-action@v1 step sets up Docker Buildx, which is required for layer caching.
  • The actions/cache@v2 step retrieves the cache, or creates one if it doesn't exist. The cache is stored in /tmp/.buildx-cache.
  • The docker/build-push-action@v2 step builds the Docker image and pushes it to DockerHub. It also manages the Docker layer cache using cache-from and cache-to options.

Mastering Multistage Builds

A Dockerfile's "multistage" build is a potent tool for reducing final image size. This process involves using multiple FROM statements, each starting a new stage of the build that can use a different base image. The artifacts needed in the final image can be selectively copied from one stage to another, discarding everything unnecessary.

Here's our optimized Dockerfile with multistage builds:

# --- BUILD STAGE ---
FROM php:7.4-fpm-alpine AS build

RUN apk --no-cache add \
    build-base \
    libpng-dev \
    libjpeg-turbo-dev \
    libzip-dev \
    unzip \
    git \
    curl

RUN docker-php-ext-install pdo_mysql mbstring exif pcntl gd zip xml

COPY --from=composer:latest /usr/bin/composer /usr/bin/composer

WORKDIR /var/www

COPY package.json yarn.lock ./
RUN yarn install

COPY . /var/www

RUN composer install
RUN php artisan optimize

# --- PRODUCTION STAGE ---
FROM nginx:stable-alpine AS production

COPY --from=build /var/www/public /var/www/html

EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Enter fullscreen mode Exit fullscreen mode

Conclusion

Leveraging Docker's caching mechanism and multistage builds can result in significant enhancements in Dockerfile efficiency for a Laravel PHP application using Yarn and Nginx. With a better understanding of these mechanisms, developers can craft Dockerfiles that build faster, produce smaller images, and thus, reduce resource usage. This deeper knowledge aids in creating more scalable and efficient applications, making you a master in Dockerfile optimization. Happy Dockerizing!

Top comments (1)

Collapse
 
ddodoo profile image
Daniel Dodoo

Was a sample PHP application code shared in the article?