So I got a very nice question from a community member here, which I would like to pick up in this series.
It was as follows:
I’m going from being a low-code developer to full-stack.
I’m building my first web app and wondering how I go from ‘it runs on my machine’ to ‘ive got a multi-environment, self-healing, auto-scaling, well-oiled internet machine’.
What would you say are the key tasks to achieve with containers and when should they be done in the project?
That is a fantastic question which merits a whole series of posts to be perfectly honest. The topic of containers might seem intimidating at first.
Containers, Images, Operators, Orchestration-Tools, Ingresses, Observability, Monitoring, PaaS, CaaS, SaaS and many more buzzwords await you in the world of containers.
But fear not, there are many essentials you can always keep in mind, that I will try to cover as much as possible in this series.
These tasks/tips will get you into a great starting position, once you start tackling the topic of containers.
Let's split that question up into smaller chunks first.
We have the following three topics to talk about here:
- The key tasks to achieve with containers
- When they should be done
- The actual transition from bare-metal/manual deployment to containers
There are many benefits to containers, here is a small list of things my customers have said in the past, that were of "special importance to them".
- Save money by using hardware as optimally as possible
- Have software be more portable
- Improve speed and quality of development
I think these reasons don't sound too bad. A lot can be already done, even without containers, but here is my take on it:
Save money by using hardware as optimally as possible
Containers require less overhead than the typical VM-Landscape many companies have today. By removing the hypervisor and sharing the same kernel between all containers, you suddenly get a lot more containers onto your hardware than VMs.
Not only do you save overhead, you also gain in speed. While VMs are just as scalable as containers, horizontally scaling a simple apache2 container is quite a bit faster than spinning up preimaged VMs.
Have software be more portable
Damn it, I forgot to install a JDK on Server X, now the microservice won't even start!
This won't happen with a container, since you package everything you need with your deliverable, the service.
If your container runs locally, it will run on Azure, GCS, Amazon ECS, or an on-premise PaaS (except if something is majorly borked).
Improve speed and quality of development
If you correctly set up your environments, a deployment to a different cluster/location/stage might just be a simple additional line:
kubectl config use-context aws kubectl apply -f awesomeapp-deployment.yaml
Want to deploy on azure instead?
kubectl config use-context azure kubectl apply -f awesomeapp-deployment.yaml
In my opinion, it doesn't really get easier or faster than that.
Quality, as always, is (probably) not to blame on the technology.
This can be answered pretty quick, I would say as early as possible. What I have learnt in many years of non-stop containerization is, "do it correct and right at the beginning". Go the extra mile, make your containers and surrounding infrastructure as seamless as possible. Don't skimp on things you think "might be overkill".
Automation/Optimization is boring, sometimes ugly and very cumbersome. But try to do it anyways, so you can reap the rewards as early as possible. You can easily strip out functionality you may not need afterwards. But adding onto a pipeline at the end almost always ends up in chaos.
What we want to achieve is, as was perfectly described before, "a well-oiled internet machine". So let's get started with the do's and don't's of containers with practical Dockerfiles.
Have a single base-image, all your other images derive from. If you need something on top of it, make another base-image that derives from that. Put your actual stuff another layer down in final Images.
It could look like that:
minimal-baseimage (for example ubuntu, alpine, centos) |__ nginx-baseimage | |__ awesome-website-container |__ quarkus-baseimage | |__ awesome-java-microservice |__ python-baseimage |__ awesome-flask-webapp
Why would you do that?
It's simple really, you need to update your images because of CVE's, patches or need something else in all your images? Update your top minimal-baseimage, kick off your CI and watch all your images getting updated to the same basic state.
Let's take a look at the path minimal -> nginx -> awesome-website.
This could be your Dockerfile for your minimal-baseimage:
FROM ubuntu:18.04 RUN apt-get -y update && \ apt-get -y upgrade && \ apt-get autoclean && \ apt-get clean WORKDIR /opt RUN chown nobody. /opt USER nobody
This could be an image that installs nginx to host your webpage for example:
FROM my-ubuntu-base:18.04-somebuildnumber USER root RUN apt-get install -y nginx && \ apt-get clean && \ chown nobody. /usr/share/nginx/html USER nobody EXPOSE 80 STOPSIGNAL SIGTERM CMD ["nginx", "-g", "daemon off;"]
And finally, this would be the actual container you deploy your website with:
FROM my-ubuntu-nginx-base:1.18.0-stable COPY awesomefiles/index.html /usr/share/nginx/html/index.html
This might look overkill, if you could replace the whole chain by doing something like this:
FROM nginx COPY static-html-directory /usr/share/nginx/html
But what is the difference, except copious amounts of effort for a single nginx-image with an awesome html-file?
Many things, but here are four important examples
- You know and control what is inside your container
- You can make sure it is updated and control the update process yourself
- You already got the prerequisites to fix anything when it's broken, or your security scanner complains about "outdated packages" (Which in a commercial environment will happen a lot)
- You control the security context inside your container, by example with using nobody instead of root in the official nginx image
As a rule of thumb, always keep in mind: "one function per container".
While multiple processes in a single container are possible and in some edge-cases may be perfectly reasonable, it will increase dramatically in difficulty to manage the whole thing.
Imagine the following processtree:
root@cc726267a502:/# pstree -ca bash |-bash springboot.sh | `-sleep 6000 |-bash prometheus.sh | `-sleep 6000 |-bash grafana.sh | `-sleep 6000 |-bash h2.sh | `-sleep 6000 `-pstree -ca
What would happen if prometheus died? Nothing really, and you wouldn't even know or get informed by docker. The main process, in this case some kind of bash wrapperscript, would still run. So Docker would have no reason to restart the container or identify it as broken.
This would be the result:
root@cc726267a502:/# kill 347 root@cc726267a502:/# pstree -ca bash |-bash springboot.sh | `-sleep 6000 |-bash grafana.sh | `-sleep 6000 |-bash h2.sh | `-sleep 6000 |-pstree -ca `-sleep 6000  Terminated bash prometheus.sh
If you split all these processes into their own containers, your orchestration would notice if they died and spin them back up again. More on orchestration in another part, promised :)
There are also other very practical reasons as for why you should limit yourself to a single process:
Scaling containers is much easier if the container is stripped down into a single function. You need another container with your app in it? Spin one up somewhere else. If your container contains all the other apps, this complicates things and maybe you don't even want a second grafana or prometheus
Having a single function per container allows the container to be re-used for other projects or purposes
Debugging a simple container locally is way easier than pulling a gigantic god-container that was blown way out of proportions
Patching. Update your base-image, kick off your CI and you are good to go. Having to test for side-effects in other processes isn't fun
Above also holds true for rollbacks of changes. The update bricked your app? No problem, change the tag in your deployment descriptor back to the old one, and you are finished.
I could go on about security, too, but let's leave it at that, you probably get the point.
Important: I am not saying that multi-process containers are inherently bad. They can work and have their uses. As a beginner into containerization, I am just recommending you to keep it as simple as possible"
This tip won't come with a Dockerfile of the bad example really, since I don't want to encourage you starting the wrong way. Maybe later down the series we will tackle that process.
As hinted towards in Tip 1) already, keep your containers updated. Always, regularly, automatically. Modern CI-Tools can help with that, most of them have webhook integrations for Git that can trigger your build-jobs. Or if you don't want to implement too much infrastructure at the start, build a cronjob or something.
Just keep it automated. You should never have to patch a container yourself, because that increases the chance to just forget it.
Automate as much as possible. You don't want to fumble around in a live container. Update the source, build, push, run your CI. Never touch a running container afterwards, or you are in for a world of pain if a live-container behaves different from a freshly built one and you don't know why.
If you followed Tip 3), you already have a patch-pipeline in place, make use of that.
Keep your container safe. The saying goes "nobody is perfect", which fits right into containers. User root is bad news.
Your software should run as nobody, so a breach will get the intruder exactly nowhere, even if you forget to remap your usernamespace ids.
Use base images with slimmed down containers. Your containers should contain no curl, no compilers, no ping or nslookup. Nothing that can result in changes or load to your or other peoples infrastructure if someone breaks in.
Harden your runtime with best practices, remap your user namespace ids, scan your images regularly for vulnerabilities and keep privilege escalations like
USER root to a minimum. You know, the things you would do with a good server too. Containers should be treated like a possible Botnet member just as well.
This tip contains two things. Choose your base system wisely and stick to it as much as possible. For example alpine. While alpine has drawbacks, and sometimes doesn't work with what you have planned to run on it, it does provide the following advantages:
- super small footprint (6MB vs. Ubuntus 74MB or CentOS whopping 237MB)
- minimal attack surface from the outside, since alpine was designed with security in mind
- Alpine Linux is simple. It brings it's own package manager and works with the OpenRC init system and script driven set-ups. It just tries to stay out of your way as much as possible
Build your container yourself. While there might be 100 containers that already do what you want to get done, you have no idea how frequently they are patched, or what else is in them. So build the container of your dreams yourself. It helps you to get into the routine of hardening, patching and optimizing them. You can never get enough experience in that regard. Always keep in mind, "you build it, you run it, you are liable for it".
I believe these few simple tips will get you a good headstart in developing healthy containers. Next time, we will take a look at how a CI for your containers could look like, by using Jenkins or Tekton.
If I left you with questions, as always, don't shy away from asking. There are no stupid questions, only stupid answers!