In order to understand docker, we have to go back in time and study the evolution of containers and how we got to where we are!
From the docker site
“Containers are a way to package software in a format that can run isolated on a shared operating system. Unlike VMs, containers do not bundle a full operating system - only libraries and settings required to make the software work are needed. This makes for efficient, lightweight, self-contained systems and guarantees that software will always run the same, regardless of where it’s deployed.”
Lets unpack it a bit
- Back in the late nineties, VMWare introdcued the concept of running multiple OS in the same hardware.
- In the late 2000s, kernel level namespacing was introduced that allows shared global resources like network and disk to be isolated by namespaces.
- In early 2010s, Containerization was born and it took virtualization at the OS level and added shared libs/bin as well. This also means we cannot run two containers that are dependent on different operating systems in the same host unless we are using a VM.
- Namespaces are the true magic behind containers. Principles are from linux containers and docker implemented its own OCI runtime called runc
Virtual Machines are virtualization at the hardware level Containers are virtualization at the OS/Software level
- Execution speed - Because containers use underlying host os,we get speeds as close a process natively running on the host os.
- Startup speed - Containers can be started in less than a second. They are very modular and can share the underlying libs/bins when needed along with host os.
- Operational speed - Containers enable faster iterations of application. There is less overhead in creating a container with new code changes and move it through the pipeline to production.
- Build an image once and use it any where. The same image that is used to run the tests is used in production. This avoids the works in my machine problems.
- Not just in production. Containers helps in running tests consistently. Ever had a scenario where all tests passed in your machine but the CI failed your tests?.
- We can specify exactly how much resources a single container can consume(CPU and memory). By understanding the available resources, containers can be packed densely to minimize wastage of CPU and memory. Scale containers within one host before scaling the instance.
- Containers are portable. Well to an extent (as long as the host is running some form of linux or linux vm).
- You can move a container from one machine to another very quickly. Imagine something went wrong while patching a security hole in the host OS, we simply move the container to a different host and resume service very quickly.
- In 2013, Docker created the first container platform.
- In 2015, Docker created the Open Container Initiative - - governance structure around container image and runtime specification. They also donated the first runtime to OCI. The current runtime used by docker and many other platforms is runC - written in golang.
- Docker Engine is built for linux.
- Docker for Mac uses HyperKit to run a lightweight Alpine Linux virtual machine.
- Docker teamed up microsoft to create Windows OCI runtime available in Windows 10 or Windows server 2016.
- Docker cli commands look very similar to git commands. Many of them share the context as well.
git pullwill get source from origin to local
docker pull <image>will get the docker image from remote registry to local
- Docker follows a client server model so the cli can connect to local docker server or the remote server
An image is a read-only template with instructions for creating a Docker container. Often, an image is based on another image, with some additional customization. For example, we may build an image which is based on the ubuntu image, but installs the Apache web server and your application, as well as the configuration details needed to make your application run.
We need a Dockerfile to create an image. Let's look at an example. This is an image to run a python flask application using gunicorn
FROM python:3.7.3-stretch ADD . /code WORKDIR /code COPY Pipfile Pipfile.lock /code/ RUN apt-get update RUN apt-get install postgresql postgresql-client --yes && \ apt-get -qy install netcat && \ pip install --upgrade pip setuptools wheel &&\ pip install --upgrade pipenv && \ pipenv install --dev --system --ignore-pipfile CMD ["/usr/local/bin/gunicorn", "--config", "wsgi.config", "coach_desk:create_app('development')"]
Each instruction in a Dockerfile above creates a layer in the image. When we change the Dockerfile and rebuild the image, only those layers which have changed are rebuilt. This is part of what makes images so lightweight, small, and fast, when compared to other virtualization technologies.
The first line in the dockerfile is
FROM which specifies the image that the current image is being built from. Lets look the
Dockerfile that is used to create the python image. This image is built from
buildpack-deps:stretch which provides all the basic tools to support any language.
FROM buildpack-deps:stretch # ensure local python is preferred over distribution python ENV PATH /usr/local/bin:$PATH # http://bugs.python.org/issue19846 # > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK. ENV LANG C.UTF-8 # extra dependencies (over what buildpack-deps already includes) RUN apt-get update && apt-get install -y --no-install-recommends \❗ tk-dev \ uuid-dev \ && rm -rf /var/lib/apt/lists/* ENV GPG_KEY 0D96DF4D4110E5C43FBFB17F2D347EA6AA65421D ENV PYTHON_VERSION 3.7.3 ...
buildpack-deps:stretch is built from
buildpack-deps:stretch-scm which is built from
buildpack-deps:stretch-curl which is built from
debian:stretch which is built from scratch.
FROM scratch ADD rootfs.tar.xz / CMD ["bash"]
If I had 1000 dockerfiles that are all built from
python:3.7.3-stretch, the related layers are not downloaded 1000 times but only once. Same goes with containers, when we run a python container 1000 times, python is installed only once and reused.
Registry is a place to store all the images. When we install docker, we have a local registry where all the images we create a stored.
docker images to list all the images currently in your local registry
docker hub is a public repository that has over 100,000 images. That would be the first place to look for pre-built images that we could use directly or use as a base image to build on top.
We can move the images from local to remote using
docker push and
docker pull commands. The default remote registry is docker hub unless we specify explicitly. At Peaksware, we use Amazon ECR to store our production docker images.
Compose was introduced so we do not have to build and start every container manually. Compose is a tool for defining and running multi-container Docker applications. Compose was initially created for development and testing purpose. Docker with recent releases have made compose yml to be used to create a docker swarm.
A microservice that our team works on needs the right postgres database with all the migrations and aws cli setup in-order to run the service locally in a developer machine. The service was fairly new and the backend kept evolving and we need a quick way to spin up everything that is needed to get the service up for front end developers that are dependent on this service. Docker compose came in handy, the compose file below would spin up two containers
- Python docker and install all dependencies and start the webserver
- Postgres database
version: '3.7' services: web: build: context: .. dockerfile: df.dev.Dockerfile environment: - DB_URI=postgres://postgres:postgres@db/idea_box command: bash -c "flask migrate && flask run -p 5000 -h 0.0.0.0" ports: - 5000:5000 links: - db volumes: - ../:/code db: image: postgres:10.1 environment: POSTGRES_DB: idea_box
We can specify the dependencies using
links. The web service container will wait until the db container is up before executing the entry command
bash -c "flask migrate && flask run -p 5000 -h 0.0.0.0" which would run the migration and start the flask server.
If any one wants to run this service, they don't have to install python or flask or postgres instead the developer runs
docker-compose -f docker-compose.yml up and wait for the api to be available at localhost:5000
This works great when your service is that simple. In reality, we had to add a queue and lambda function to process the queue and send messages to a different service. Unfortunately, We found
localstack which emulates AWS services.
We can spin up SQS instance locally using localstack and create a queue using an init shell script that is called via entrypoint in localstack container.
This still does not represent the complete service. We still need a local lambda function that would read from the queue and push the message to another service. This is where I found the effort it takes to set up the entire service within docker compose out weighed the benefits.
version: '3.7' services: web: build: context: .. dockerfile: .df.dev.Dockerfile environment: - DB_URI=postgres://postgres:postgres@db/coach_desk - AWS_ACCESS_KEY_ID=foo - AWS_SECRET_ACCESS_KEY=bar - AWS_ENDPOINT=http://aws:4576 command: bash -c "flask migrate && flask run -p 5000 -h 0.0.0.0" ports: - 5000:5000 links: - db - aws volumes: - ../:/code db: image: postgres:10.1 environment: POSTGRES_DB: coach_desk aws: image: localstack/localstack ports: - 4576:4576 - 8080:8080 environment: - SERVICES=sqs - DEBUG=True volumes: - ./localstack:/docker-entrypoint-initaws.d
Even with some of the complexities involved in using docker compose for a real service, I would recommend experimenting with it to see if it works for your team. I would love to hear how you/your team use docker for development and testing.
No painful developer machine setup. With compose, anyone can spin up a service pretty quick without having to install all dependencies that they will never use.
Consistent outcome - from dev to prod. Builds are reproducible, reliable, and tested to function exactly as expected on production.
Speed up testing - We have tests that need test database and we have the tear-down after each test/group of tests to clean up the database. I am working on ways to run parallel tests with database running in multiple containers for our project.
Code reviews can be painless - each dev can attach an image with their code review that the reviewer can quickly spin up without having to interrupt what they are doing and test a different version of the code.
Quick Fixes can be quick - When developers find bugs, they can fix them in the development environment and redeploy them to the test environment for testing and validation.When testing is complete, getting the fix to the customer is as simple as pushing the updated image to the production environment.