DEV Community

Discussion on: I'm Junade Ali, author of multiple software books and working on a PhD in theoretical computer science. Ask me anything!

Collapse
 
icyapril profile image
Junade Ali

Firstly, make sure you've nailed Test Driven Development. When developing, write the tests first and your software will be better architected alongside better tested. When a bug appears, write the test that would detect it before solving the problem itself.

Good software architecture goes a far way. Deploy fast and automate your build systems. Be careful of cascading failure and have a system that can operate when certain services drop, 5 services running at 95% uptime means the overall system has ~77% uptime. Message Queues will save you when Database-as-IPC bites.

In absolutely mission critical environments, you need redundancy. For example; in the systems I would work on, there would be two embedded systems with identical software. The other chip would be able to take control if something went wrong (and had the ability to detect erroneous behaviours) - and even speed up deployments by having one chip reprogram the other.

At Support Operations in Cloudflare, we are increasingly seeing these kinds of problems as optimisation problems and are doing some interesting work in practical applications of Search Based Software Engineering in distributed environments. Stay tuned to my Twitter feed or the Cloudflare Blog to learn more in due time.