“Don't take anything for granted, because tomorrow is not promised to any of us.”
- Kirby Puckett
There is a behavior I’ve observed repeatedly over the years that I’m certain boils down to one of the basics of human psychology. The crux of it is this: when something is readily available to us in vast quantities, the less we appreciate said thing. Put a slightly different way, we tend to be less concerned with how much we use of something when we have plenty to spare. It’s an interesting phenomenon to watch play out, and I see it every single time I open a brand new tube of toothpaste. It’s the same pattern every time - the first glob out of the tube looks remarkably like the marketing images, full coverage in the nice shape of a wave. But as I get to the end of the tube, I become predictably stingy. It turns out that the small dabs of toothpaste that I use after obsessively smoothing out and rolling up the tube is all I really need, and yet, somehow a brand new tube equates to a full, beautiful wavy glob. Come to think of it, pretty much anything I use that comes in a tube or other similar container has the same fate.
Another real-life example of this behavior I regularly struggle with is how much ice cream I have access to. There is a direct relationship between my dieting success and the amount of ice cream in my freezer. I’m not saying ice cream is evil (it is), it’s the availability of too much ice cream that results in my repeated failures. My lack of self-control in this situation is a completely separate matter that I don’t wish to discuss.
You might not immediately relate to the toothpaste or ice cream scenarios (are you even human?), but there is a fairly long list of essential things we have all taken for granted at one point or another - the availability of running water in your home, the ease of flipping a switch to read in the evening, cool and breathable air! Of course, all of this is in varying degrees depending on our history and current access to these things. But that is exactly the point I’m making. We intrinsically know that these resources greatly improve our well-being and are of utmost importance (some essential to life!), and yet until we are faced with some kind of limiting factor, it’s difficult for us to appreciate them in the way we should.
To be clear, none of this is meant to shame or guilt anyone. This is all just an observation of something that is completely natural and probably even beneficial to us as human beings. If we spent our days worrying about everything that is essential to us and how our lives would be without them, we would be nothing but shriveling heaps of tears and angst at the end of the day. Living our lives is very much like spinning plates - it’s the wobbly plate that gets our immediate attention. The management of our resources is very much related to the quantity available to us and we are left to figure out how to deal with whatever crisis is at hand when we hit unexpected limits. We re-evaluate our needs and then we find clever solutions to subsist on what we have. And round and round we go.
This brings me to our current wobbly plate in our DevOps world.
Given the title of this article, you can probably tell where I’m going with this. Let’s set aside deep discussions of human behavior and life on this planet for another time, and instead, let’s figure out how to apply what we’ve learned from our observations so far to the latest happenings in DevOps tooling and resources.
“For want is nexte to waste, and shame doeth synne ensue,” (waste not, want not)
- Richard Edwards
Docker Hub recently updated their terms of service, (see section 2.5), for their free service level accounts to include a rate limit (a limit on the number of pushes and pulls of Docker images over a span of six hours), as well as a retention policy for inactive images (images that are not pushed or pulled for the last six months are deleted). If you can imagine, these changes have lit quite the firestorm of discussion on social media. Everyone relying on these services is having to come to terms with these new limitations to be sure that their pipelines will not be adversely affected.
Let’s break this down.
Prior to these changes, developers and full-scale CI/CD systems were able to push and pull Docker images from Docker Hub without any limitations. On top of that, free storage! This is a pretty incredible service and frankly very easy to take advantage of. You know how when you have more storage, you store more things. This behavior permeates my own life across the board. My digital photo album is an excellent example. My house is another example. I moved from an apartment to a home and I magically have more stuff! Like goldfish, we tend to fill the space we’re in and then forget what we have.* Again, this is just a natural human behavior. But the moment that storage is assigned a price, (or a retention policy in the case of Docker images stored in Docker Hub), we now must take a step back and figure out how to manage our storage a little more thoughtfully. We must clean out our closets, so to speak.
The new limits imposed by Docker Hub is a bit of a call to action to define some netiquette around the use of these free services. This is both a jolt to re-evaluate and consider our use of the resources affected as well as an opportunity to save ourselves from some of the negative consequences of taking these high-valued resources for granted. For those DevOps professionals out there that are already following best practices, this announcement from Docker is far from a deal-breaker for the use of Docker Hub and will certainly not result in their software development and distribution pipelines grinding to a halt. We’ll talk in the next section about what those best practices are, but first, let’s discuss the real elephant in the room, and perhaps the real fear that the Docker terms of service update has unveiled.
There seems to be an unhealthy reliance on external resources when it comes to critical internal operations. Specifically, if my team of developers cannot access a Docker image when required in their personal development environments (and requests from developers could be multiple times a day depending on the circumstances), their progress on the next feature or bug fix is potentially blocked. In the same way, if my CI/CD system that is responsible for building my next software release cannot access the binaries it needs, my team may end up in a position where they cannot release. The same can be said for every intermediary step of the pipeline including initial integration and deployment to quality assurance test environments. By taking for granted the access to and storage of the most integral building blocks of our software, our software binaries, many find themselves completely at the mercy of an external service.
Docker Hub is not the only organization out there whose free service level offering is subject to limitations. It is not an uncommon occurrence that near the end of the month, Boost, (one of the most popular library projects in C++), reaches a point where the distributable is no longer accessible because the organization’s monthly download allowance has been exceeded. Docker and Boost have intentional limitations set. Some services will degrade or encounter downtime when demand is too high or because of any number of other reasons. For example, NuGet Gallery, the central repository for .NET packages, provides a status page to let stakeholders know what is going on when there is an outage. The most unfortunate scenario which has more to do with uncontrolled risk rather than free service limits is when a remote binary that your build relies upon just up and disappears, like what happened during the infamous NPM left-pad debacle of 2016. All of these examples call attention to the problems and potential productivity killers that teams face when relying on remote resources for software binaries. Another important point to make here... this is not a new problem!
“There is a store of great value in the house of the wise, but it is wasted by the foolish man.”
- Proverbs 21:20 (BBE)
So now that I have a full understanding of the real value of my Docker images and other binaries and can better evaluate the methods I use to store and retrieve them, what can I do to help keep my builds and my whole software development pipeline alive and drama free? Obviously, taking the stance of never using free services like Docker Hub is unacceptable as this will put you at a disadvantage. These services are valuable and certainly have their place. But 100% reliance on them is clearly unhealthy. Expecting them to meet an unbounded need is also unrealistic.
Step 1: Take an inventory of your software project.
It’s important to know exactly what libraries and packages your software project is pulling in. Understand exactly where your binaries are coming from. For Docker images, make sure you understand thoroughly what is happening when you build your images. For example, are there any lines in your Dockerfile that pull from npm, perform a pip install, or update other software packages? All of these actions potentially reach out to remote service providers and will count against any download limits.
Step 2: Utilize multiple levels of caching.
Given that many remote offerings like Docker Hub, Boost, npm, NuGet Gallery, and many others have very real limitations and possibly unplanned downtime, it’s important to mitigate both the risk of not being able to access your binaries when needed as well as eliminate unnecessary polling for these resources. One of the most valuable things you can do is set up a caching proxy like JFrog's Artifactory, (a remote repository), for these remote resources. The next level of cache that will play an important role is a developer’s local environment. Developers should be set up to pull required resources from the caching proxy rather than repeatedly from the remote service.
Step 3: Modify CI/CD pipelines to pull from cache.
Even if your CI/CD processes involve building your code from scratch on brand-new, temporary instances, set them up to pull from your proxy setup in Step 2 rather than repeatedly pulling from remote sources. A misbehaving pipeline can easily meet up with throttling and other download limitations if left unchecked. It is better for your CI/CD pipelines to utilize internal resources that you have control over rather than leave them attempting to pull from remote sources that may be unavailable. If you set up your pipelines this way, you will be more empowered to troubleshoot and resolve any internal issues you experience in order to complete your pipeline processes successfully rather than be relegated to the priority queue of an external service.
I expect nothing less than a ton of buzz and discussion about this move by Docker and even the thoughts I’ve written here. This is a good thing. My hope is that this move will bring to light the realities of providing a service that so many in the industry have come to rely on and ultimately what it means to be a responsible user of community resources. I also hope that we come to fully appreciate the costs associated with access to and storage of our most valuable software building blocks - that we are more thoughtful about where we put them and how we get to them since they are fundamental to our organization’s software.
* This is actually an entirely untrue statement about goldfish, but you get my meaning. Blogs like this one perpetuate these falsehoods, so here is a resource to hopefully make up for it: https://www.tfhmagazine.com/articles/freshwater/goldfish-myths-debunked