Which steps to ensure the robustness of a small distributed system ?

#help #distributedsystems #discuss

I'm a student doing a college project. Can say I'm an experienced programmer, I have been tackling with distributed systems for so long before this project but never taken it seriously.

This project is fairly small as a distributed system, so it's easy to make a working implementation. Now I have to make it robust and write a report about it.

Searching on the web is frustrating. There're only long reads in form of books, smaller one like reports or paper focus on only one specific aspect such as scalibility. They are all for domain experts working on large systems in industry. The 8 fallacies of DS do remind me something to do but they're still abstract concepts, mot HOWTO.

So, how to make a distributed system robust (practically secure and fast), and, prove it (any metrics I can measure and put on the scientific report) ?

Many thanks !