DEV Community

Cover image for System Design : Reliability
DevByJESUS
DevByJESUS

Posted on

System Design : Reliability

Hello.
Today we are going to talk about a subject that comes in mind when we design a data intensive system. But before we are going to present what is a data Intensive system , what are the principles when designing this type of system.
Let's go ;)

Firstly This amazing Quote

The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free?

  • Alan Kay

Data Intensive System

Read what Martin Kleppmann says about the definition

Data-intensive applications are pushing the boundaries of what is possible by making use of these technological developments. We call an application data-intensive if data is its primary challenge: the quantity of data, the complexity of data, or the speed at which it is changing (as opposed to compute-intensive, where CPU cycles are the
bottleneck).

I can add nothing to this definition ;)

Some Questions

I can assure you , if we are in an I.T team some of these questions below come frequently

How do you ensure that the data remains correct and complete, even when things go wrong internally? How do you provide consistently good performance to clients, even when parts of your system are degraded? How do you scale to handle an increase in load? What does a good API for the service look like?

We can give response to these questions based on System Design Principles

what we are waiting

Principles Of System Design

There is Three core principles , and i think we have heard some of them sometimes

  1. Reliability
  2. Scalability
  3. Maintainability

Today we are going to talk about Reliability only ;)

Reliability

What is Reliability , we say that a system is reliable when he is fault-tolerance , otherwise when it can prevents error. We all agree that there is no system with a fault-tolerance of 100%. But ;) there is some Faults we can prevent in our system.

Possible Patterns against Reliability

  1. Hardware Faults: It is all the errors which can happen on hardware like Hard disks crash, RAM becomes faulty, the power grid has a blackout, someone unplugs the wrong network cable>

How to fight it ? : The world grows , storage or machines grow so for this type of Faults , we can think about redundancy to the individual hardware components in order to reduce the failure rate of the system. Disks may be set up in a RAID configuration, servers may have dual power supplies and hot-swappable CPUs, and data centers may have batteries and diesel generators for backup power.

  1. Software Faults : Martin Kleppmann says The bugs that cause these kinds of software fault often lie dormant for a long time until they are triggered by an unusual set of circumstances. In those circumstances, it is revealed that the software is making some kind of assumption about its environment

How to Deal With It ? : Carefully thinking about assumptions and interactions in the system, thorough testing, process isolation, allowing processes to crash and restart, measuring, monitoring and analyzing system behavior in production .

And Finally , it would have surprise me if we humans were not in this list 😄

  1. Human Faults : one study of large internet services found that configuration errors by operators were the leading cause of outages, whereas hardware faults (servers or network) played a role in only 10–25% of outages.

Possible Solutions :

  1. Decouple the places where human make the most mistakes from the places where they can cause failures , i think we use this daily in our life when we take our daily decisions, for example to buy electricity in our home we want to make no mistake but for buying sugar if there is a mistake the consequences are not as big as for electricity.

  2. Test At All Levels from Unit to Integration : With Automated Testing our system can give us confidence in his daily working on the users side.

😊 Thanks For Reading. By The Grace of JESUS 😊 in the Next Article we will talk about scalability.
The Book of M. Kleppmann Designing Data Intensive Applications .

Top comments (0)