The most common reason why a distributed system is unable to sustain iterative releases and turns into a horror story is tight coupling between components.
When building (micro)services, the key decisions are about defining their boundaries and how they communicate.
Changing one service shouldn't require changing another. If one service goes down, other services or, worse, the system as a whole should not go down. Services with well-defined boundaries allow us to to change a behavior in one place, and release that change as quickly as possible.
We don't want to end up with a system where we have to make changes in many different places in order to make a change. This process is slow and prevents clear code ownership. Deploying more than one service at a time is risky.
A loosely coupled service contains related behavior in one place and knows as little as possible about the rest of the system with which it collaborates.
A loosely coupled system is conservative in the design of communication between services. Services usually communicate by making asynchronous remote procedure calls (RPC), use a small number of endpoints and assume that failure will happen. There is no shared database, and all changes to databases are run iteratively as part of the CI/CD pipeline.
Metrics and monitoring are also an important part of the feedback loop that enables iterative development. Having metrics that can detect issues in real-time gives us confidence to make changes knowing that we can quickly recover from any error.
This is an excerpt from my book on CI/CD for cloud native applications: