Yuva

Posted on May 12, 2019

Docker and Docker Compose

#docker #dockercompose #containers

In order to understand docker, we have to go back in time and study the evolution of containers and how we got to where we are!

What is a container?

From the docker site

“Containers are a way to package software in a format that can run isolated on a shared operating system. Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed. This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.”

Lets unpack it a bit

Back in the late nineties, VMWare introdcued the concept of running multiple OS in the same hardware.
In the late 2000s, kernel level namespacing was introduced that allows shared global resources like network and disk to be isolated by namespaces.
In early 2010s, Containerization was born and it took virtualization at the OS level and added shared libs/bin as well. This also means we cannot run two containers that are dependent on different operating systems in the same host unless we are using a VM.
Namespaces are the true magic behind containers. Principles are from linux containers and docker implemented its own OCI runtime called runc Virtual Machines are virtualization at the hardware level Containers are virtualization at the OS/Software level

Advantages of using containers

Speed
- Execution speed - Because containers use underlying host os,we get speeds as close a process natively running on the host os.
- Startup speed - Containers can be started in less than a second. They are very modular and can share the underlying libs/bins when needed along with host os.
- Operational speed - Containers enable faster iterations of application. There is less overhead in creating a container with new code changes and move it through the pipeline to production.
Consistency
- Build an image once and use it any where. The same image that is used to run the tests is used in production. This avoids the works in my machine problems.
- Not just in production. Containers helps in running tests consistently. Ever had a scenario where all tests passed in your machine but the CI failed your tests?.
Scalability
- We can specify exactly how much resources a single container can consume(CPU and memory). By understanding the available resources, containers can be packed densely to minimize wastage of CPU and memory. Scale containers within one host before scaling the instance.
Flexibility
- Containers are portable. Well to an extent (as long as the host is running some form of linux or linux vm).
- You can move a container from one machine to another very quickly. Imagine something went wrong while patching a security hole in the host OS, we simply move the container to a different host and resume service very quickly.

Enter Docker

Docker as a company

In 2013, Docker created the first container platform.
In 2015, Docker created the Open Container Initiative - - governance structure around container image and runtime specification. They also donated the first runtime to OCI. The current runtime used by docker and many other platforms is runC - written in golang.

Docker runtime/daemon/engine

Docker Engine is built for linux.
Docker for Mac uses HyperKit to run a lightweight Alpine Linux virtual machine.
Docker teamed up microsoft to create Windows OCI runtime available in Windows 10 or Windows server 2016.

Docker Cli

Docker cli commands look very similar to git commands. Many of them share the context as well.
1. git pull will get source from origin to local
2. docker pull <image> will get the docker image from remote registry to local
Docker follows a client server model so the cli can connect to local docker server or the remote server

Docker Images

An image is a read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization. For example, we may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.

We need a Dockerfile to create an image. Let's look at an example. This is an image to run a python flask application using gunicorn

FROM python:3.7.3-stretch

ADD . /code
WORKDIR /code

COPY Pipfile Pipfile.lock /code/

RUN apt-get update

RUN apt-get install postgresql postgresql-client --yes && \
    apt-get -qy install netcat && \
    pip install --upgrade pip setuptools wheel &&\
    pip install --upgrade pipenv && \
    pipenv install --dev --system --ignore-pipfile

CMD ["/usr/local/bin/gunicorn", "--config", "wsgi.config", "coach_desk:create_app('development')"]

Images are a collection of immutable layers.

Each instruction in a Dockerfile above creates a layer in the image. When we change the Dockerfile and rebuild the image, only those layers which have changed are rebuilt. This is part of what makes images so lightweight, small, and fast, when compared to other virtualization technologies.

Images can also be built on top of other images.

The first line in the dockerfile is FROM which specifies the image that the current image is being built from. Lets look the Dockerfile that is used to create the python image. This image is built from buildpack-deps:stretch which provides all the basic tools to support any language.

FROM buildpack-deps:stretch

# ensure local python is preferred over distribution python
ENV PATH /usr/local/bin:$PATH

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# extra dependencies (over what buildpack-deps already includes)
RUN apt-get update && apt-get install -y --no-install-recommends \❗
        tk-dev \
        uuid-dev \
    && rm -rf /var/lib/apt/lists/*

ENV GPG_KEY 0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D

ENV PYTHON_VERSION 3.7.3
...

buildpack-deps:stretch is built from buildpack-deps:stretch-scm which is built from buildpack-deps:stretch-curl which is built from debian:stretch which is built from scratch.

FROM scratch
ADD rootfs.tar.xz /
CMD ["bash"]

If I had 1000 dockerfiles that are all built from python:3.7.3-stretch, the related layers are not downloaded 1000 times but only once. Same goes with containers, when we run a python container 1000 times, python is installed only once and reused.

Docker registry

Registry is a place to store all the images. When we install docker, we have a local registry where all the images we create a stored.
Try docker images to list all the images currently in your local registry
docker hub is a public repository that has over 100,000 images. That would be the first place to look for pre-built images that we could use directly or use as a base image to build on top.

We can move the images from local to remote using docker push and docker pull commands. The default remote registry is docker hub unless we specify explicitly. At Peaksware, we use Amazon ECR to store our production docker images.

Docker Compose

Compose was introduced so we do not have to build and start every container manually. Compose is a tool for defining and running multi-container Docker applications. Compose was initially created for development and testing purpose. Docker with recent releases have made compose yml to be used to create a docker swarm.

Using docker compose for a real application

A microservice that our team works on needs the right postgres database with all the migrations and aws cli setup in-order to run the service locally in a developer machine. The service was fairly new and the backend kept evolving and we need a quick way to spin up everything that is needed to get the service up for front end developers that are dependent on this service. Docker compose came in handy, the compose file below would spin up two containers

Python docker and install all dependencies and start the webserver
Postgres database

version: '3.7'

services:
  web:
    build:
      context: ..
      dockerfile: df.dev.Dockerfile

    environment:
      - DB_URI=postgres://postgres:postgres@db/idea_box

    command: bash -c  "flask migrate && flask run -p 5000 -h 0.0.0.0"
    ports:
      - 5000:5000
    links:
      - db
    volumes:
      - ../:/code

  db:
    image: postgres:10.1
    environment:
      POSTGRES_DB: idea_box

We can specify the dependencies using links. The web service container will wait until the db container is up before executing the entry command bash -c "flask migrate && flask run -p 5000 -h 0.0.0.0" which would run the migration and start the flask server.

If any one wants to run this service, they don't have to install python or flask or postgres instead the developer runs docker-compose -f docker-compose.yml up and wait for the api to be available at localhost:5000

Adding real complexity

This works great when your service is that simple. In reality, we had to add a queue and lambda function to process the queue and send messages to a different service. Unfortunately, We found localstack which emulates AWS services.

We can spin up SQS instance locally using localstack and create a queue using an init shell script that is called via entrypoint in localstack container.

This still does not represent the complete service. We still need a local lambda function that would read from the queue and push the message to another service. This is where I found the effort it takes to set up the entire service within docker compose out weighed the benefits.

version: '3.7'

services:
  web:
    build:
      context: ..
      dockerfile: .df.dev.Dockerfile

    environment:
      - DB_URI=postgres://postgres:postgres@db/coach_desk
      - AWS_ACCESS_KEY_ID=foo
      - AWS_SECRET_ACCESS_KEY=bar
      - AWS_ENDPOINT=http://aws:4576

    command: bash -c  "flask migrate && flask run -p 5000 -h 0.0.0.0"
    ports:
      - 5000:5000
    links:
      - db
      - aws
    volumes:
      - ../:/code

  db:
    image: postgres:10.1
    environment:
      POSTGRES_DB: coach_desk


  aws:
    image: localstack/localstack
    ports:
      - 4576:4576
      - 8080:8080
    environment:
      - SERVICES=sqs
      - DEBUG=True
    volumes:
      - ./localstack:/docker-entrypoint-initaws.d

Even with some of the complexities involved in using docker compose for a real service, I would recommend experimenting with it to see if it works for your team. I would love to hear how you/your team use docker for development and testing.

Developing with Docker

No painful developer machine setup. With compose, anyone can spin up a service pretty quick without having to install all dependencies that they will never use.
Consistent outcome - from dev to prod. Builds are reproducible, reliable, and tested to function exactly as expected on production.
Speed up testing - We have tests that need test database and we have the tear-down after each test/group of tests to clean up the database. I am working on ways to run parallel tests with database running in multiple containers for our project.
Code reviews can be painless - each dev can attach an image with their code review that the reviewer can quickly spin up without having to interrupt what they are doing and test a different version of the code.
Quick Fixes can be quick - When developers find bugs, they can fix them in the development environment and redeploy them to the test environment for testing and validation.When testing is complete, getting the fix to the customer is as simple as pushing the updated image to the production environment.

DEV Community