Alex Ortiz

Posted on Oct 18, 2019 • Edited on Nov 16, 2019

docker learn #02: Package Managers

#docker #containers #devops

Last week, I launched my new docker learn series. I'm learning Docker so I can be more effective in my work at Educative and interacting with the global developer community. I'll be sharing my learning journey as a series of posts, snapshots of what I'm learning as well as how I'm learning it. Hopefully this helps other Docker newbies out there like me, both now and in the future. Here's the second post in the series.

A New Learning: Package Management

In Dockerfiles at work, I'd seen commands like RUN apt-get install xyz and RUN apt-get update and had wondered what they encode. I've since learned that Linux distributions based on Debian use the .deb format for package files, and the Advanced Package Tool, or apt, for package management. So the apt-get install and apt-get update commands are used to prompt a Docker daemon to install or update software packages needed for Docker container payloads to run properly, if those payloads are based on Linux operating systems that are Debian-based, such as Ubuntu.

This is in contrast to Linux distributions based on Red Hat, such as the Fedora operating system, which uses .rpm package formats and package managers like rpm or dnf. Alpine Linux, about which I answer a question below, uses apk for package management. It's cool that, by reading instructions on a Dockerfile, you can make inferences about the underlying operating systems containerized applications utilize.

I feel like I leveled up just writing that.

😜

My Answer Roundup

One of the best ways to learn something is to teach it. By sharing what you know, you achieve two things. First, you get to practice and access your knowledge in a way that maps to a real-world request, which is a subtle form of learning by problem-solving. Second, you pull recently acquired knowledge closer to your center, making it easier for you to assess what you've already internalized and what will extend your knowledge horizon further out.

So I've been answering one question per day on the web. Below is a roundup of those questions. If I got something wrong, go ahead and correct me in the comments (thank you).

On the role of Docker Compose vs that of a Dockerfile

Q: Why do we need Docker Compose when there is Dockerfile (I am a newbie at using Docker)? Am I missing something?

Here's my answer:

Docker lets you run a single instance of one process or task or application inside a container
To run that container, you need a docker image to base the container off of
To create the docker image, you need a Dockerfile—a set of instructions docker will follow to create the necessary docker image
If you’re only running one process/task/app, you only need one container. But what do you do if you have a situation that requires multiple processes/tasks/apps to work together? Well, you might need multiple containers
Docker Compose provides a way to deploy and coordinate—or orchestrate—multiple containers
In that case, you would need a Docker Compose file in the format of JSON or YAML, with information about each process/task/app, how they work together (the relationships between them), their networking and configuration info, and so on
Note that every different process/task/application will need to be built off of a different docker image, and therefore you will need a different Dockerfile for each one. For example, if you’re running an app that requires Node for the frontend and a database for the backend, then you will need a Dockerfile for the image used to create the Node container and a different, separate Dockerfile for the image used to create the database container. Then, you will use a Docker Compose file to coordinate how the two containers will work together
So in summary, a Dockerfile is used to create a docker image capable of running a single instance of one process/task/application inside a container. A Docker Compose file is used to create a pod of containers that work together to bring your multi-container use case to life

On choosing base images for Dockerfiles

Q: When using multiple tools (Node, MySQL, Meteor), what should I choose as a base image for my Dockerfile?

Here's my answer:

This will only partially answer your question, but I hope this is helpful. You won’t just have one Dockerfile: you’ll have more than one. That’s because in the container model, each tool or process—MySQL and Meteor, for example—runs in a separate container. A MySQL container will be based on a MySQL image; a Meteor container will be based on a Meteor image. Therefore, you’ll end up with two separate Dockerfiles, one for MySQL and one for Meteor. (I imagine that because Meteor relies on NodeJS as its language runtime, any Meteor image will already be running node, i.e., the Dockerfile used to build the Meteor image will likely have an instruction to create a NodeJS image layer).

Once you have the Dockerfiles you need to build the images your app requires, then you’ll likely use a container orchestrator like Docker Compose to configure this multi-container setup.

The choice of base image for each Dockerfile is currently beyond my knowledge horizon. But here are a few resources I came across that may or may not be helpful to you:

On what Docker images are

Q: What are Docker Images?

Here's my answer:

The logo of Moby Dock (the blue Docker whale with the 9 shipping containers on top of it) is helpful here. In Docker, each container is, ideally, a single task, process, or application. One per container. The container is a single, short-lived instance of that one task, process, or application. Those containers run on a Linux machine, either natively or on a Linux virtual machine running on Windows or Mac. So in the Docker logo, those nine containers are independent instances of one or more tasks, processes, or applications, and Moby Dock represents the Linux server(s) those containers run on.

To build a container, you need a Docker image. That image is created with a set of instructions, a Dockerfile. Once you build the docker image for a particular task, process, or application, then you and anyone with access to that docker image can easily run as many containers for it.

On why Alpine is popular as a Linux distro

Q: Why is Alpine Linux so popular as base Docker image?

Here's my answer:

The base image for Alpine Linux is much smaller than that of other Linux images, such as Ubuntu. For example, the alpine:latest image is only 2.66 MB in size. By comparison, the ubuntu:latest image is 10X larger: 25.48MB in size. This might not seem like much of a difference, but smaller images and faster build times make a difference when you have apps in production at scale. And for some container applications, Ubuntu might be overkill where Alpine will do just fine. So that’s probably one reason.

But it does come with a trade-off: for example, the Alpine image has the faster and more lightweight Unix shell Ash, whereas the Ubuntu Linux image comes with the more “feature-rich” Unix shells like Bash and Dash.

On Netflix's lack of Kubernetes

Q: Does Netflix use Kubernetes?

Here's my answer:

Netflix does not use Kubernetes. As of 2017, Netflix’s architecture was running millions of Docker containers on virtual machines (VMs) hosted on AWS EC2. But rather than use Kubernetes as the scheduler and orchestrator for all these Docker containers, Netflix built its own container runtime and scheduler, Titus, which was based on Apache Mesos and ran on AWS EC2. You can read more here, here, and here.

On what makes Kubernetes useful

Q: How is Kubernetes so powerful? What can be done with it?

Here's my answer:

Running individual tasks, processes, or software applications in standalone, easy-to-deploy, reliable Linux containers is a great way to add horizontal and vertical scale to a technology stack. But as the number of containers in your architecture grows, so does the need to coordinate them—

how do they talk to each other?
how do you automate the manual-heavy work of building Docker images and deploying containers based on those images?
how do you handle dips and spikes in container use, especially if you have hundreds, or thousands, of containers running at any given time?
how do you deal with containers that rely on other containers before they work, e.g., a database container needs to be active and volume-mounted before another application container can do its job correctly?

All of these things and a dozen other questions are what container schedulers and orchestrators like Kubernetes are meant to help answer. With Kubernetes, you can deploy and manage 1,000 or 2,000 Docker containers while addressing the host of questions I mentioned above. This is a simple starting point but hopefully helpful in contextualizing the power of Kubernetes.

Some Helpful Resources

Here are a few items I found helpful this past week:

Video Presentation: Redbeard, Sysdig and CoreOS Meetup Jul '15: Best Practices For Container Environments
Video Presentation: KodeKloud, Understanding Docker
Article: Eric Chiang, Containers from Scratch

That's all I've got for this week. See you next time!

DEV Community

docker learn #02: Package Managers

A New Learning: Package Management

My Answer Roundup

On the role of Docker Compose vs that of a Dockerfile

On choosing base images for Dockerfiles

On what Docker images are

On why Alpine is popular as a Linux distro

On Netflix's lack of Kubernetes

On what makes Kubernetes useful

Some Helpful Resources

Top comments (0)

Read next

Mastering Docker: Simplified Guide for Developers - A Game-Changing Tool Explained

AWS Advanced: The Quota Monitor Review

Docker makefile

Docker All in one 1️⃣