I'm a Systems Reliability and DevOps engineer for Netdata Inc. When not working, I enjoy studying linguistics and history, playing video games, and cooking all kinds of international cuisine.
To be honest I was only thinking of npm and Docker build caches, things where I never really ran into any problems with invalid caches. But I see your point.
Actually, Docker build caching is one potential source of issues, because you can only safely use caches if all parts of your build process (and all underlying images) are nondeterministic and invariant of all external resources. For example, if you are building something off of an Ubuntu base image and running apt-get update && apt-get upgrade -y as part of the build, you actually can’t safely use Docker’s built-in cache because of how it handles invalidation (put differently, if you build that same Docker image with a clean cache at two different points in time, you can end up with two different images, which means you can’t safely use caching).
What is the main reason you invest in caching? Speed or Cost?
Speed primarily, because the difference can be huge, and none of the build infrastructure I use charges for time.
True, you should always use explicit versioning in your Dockerfile, to have reproducible builds.
But even without it, caching should work and speed up things and the build should be able to work for as long as you keep the cache. At least I don’t see a reason why caching unversioned commands should break the build.
I do see the point that, if the cache gets dropped for whatever reason, you may end up with a different build as different version of the dependencies may get installed (which you probably don’t want if you aim for reproducible builds).
I'm a Systems Reliability and DevOps engineer for Netdata Inc. When not working, I enjoy studying linguistics and history, playing video games, and cooking all kinds of international cuisine.
The flip side though is that sometimes you want (or even need) to always be using the latest versions of dependencies. For example, where I work we use Docker as part of our process of building native DEB/RPM packages for various distros (because it lets us make the process trivially portable), and in that case, we always want to be building against whatever the latest versions of our dependencies are so that the resulting package installs correctly.
In such a situation, caching the Docker build results can cause that requirement for tracking the latest dependency versions to be violated.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Actually, Docker build caching is one potential source of issues, because you can only safely use caches if all parts of your build process (and all underlying images) are nondeterministic and invariant of all external resources. For example, if you are building something off of an Ubuntu base image and running
apt-get update && apt-get upgrade -y
as part of the build, you actually can’t safely use Docker’s built-in cache because of how it handles invalidation (put differently, if you build that same Docker image with a clean cache at two different points in time, you can end up with two different images, which means you can’t safely use caching).Speed primarily, because the difference can be huge, and none of the build infrastructure I use charges for time.
True, you should always use explicit versioning in your Dockerfile, to have reproducible builds.
But even without it, caching should work and speed up things and the build should be able to work for as long as you keep the cache. At least I don’t see a reason why caching unversioned commands should break the build.
I do see the point that, if the cache gets dropped for whatever reason, you may end up with a different build as different version of the dependencies may get installed (which you probably don’t want if you aim for reproducible builds).
The flip side though is that sometimes you want (or even need) to always be using the latest versions of dependencies. For example, where I work we use Docker as part of our process of building native DEB/RPM packages for various distros (because it lets us make the process trivially portable), and in that case, we always want to be building against whatever the latest versions of our dependencies are so that the resulting package installs correctly.
In such a situation, caching the Docker build results can cause that requirement for tracking the latest dependency versions to be violated.