Recovery over Perfection means that, as Tech, we would rather focus our energy on ensuring that we can recover from issues quickly and effectively, rather than try to prevent all possible failures.
This supports us in the following quest: to keep speed and momentum in our release of value to the customers and users. to avoid being held back in innovation. to seek the right balance for Coolblue between cost of development and cost of maintenance.
This philosophy leads us to the following: Recovery must be ‘instantaneous’ (or super fast, at the very least).
This means we must be able to do the following things very quickly:
- Discover failure
- Restore acceptable behaviour
- Diagnose issues
- Solve issues
- Apply permanent solutions
- Learn from our observations (Apart from discovering failure all of the other actions could be applied in various ways/orders).
Discovering and diagnosing failure requires effective monitoring and logging. Restoring acceptable behaviour requires easy, reproducible deployments. Solving and applying solutions requires maintainable code, accompanied by automated tests (see our other principles).
Essentially, this comes down to the following: monitoring, logging, alteration, integration, deployment and testing should be a priority. How much of each is a balance question, but in our principles we will set out minimum expectations.
It doesn't relieve you from programming or designing defensively. Networks will fail, dependencies might break down, users tend to be very creative in their input for the application. You have to expect this will happen and handle it accordingly without breaking your application. Recovery over perfection doesn't relieve you from your responsibility to handle failure gracely. It requires you to make conscious decisions around safety nets and the amount of effort needing to spend.
Go to the overview of our Tech Principles.
Top comments (0)