The fallacies of distributed computing are a list of false assumptions that architects and developers of distributed systems may make. In this post, we'll look at what these fallacies are, how they came to be, and how to avoid them to build reliable distributed systems.
The network is reliable
When we build systems with only one machine, we usually assume that the components can talk to each other in a reliable way. All the parts are in one computer, so talking to RAM or HDD is easy and doesn't cause a lot of communication errors.
When we build distributed systems, our computers are spread out over a network, which is not very reliable.
There have been times when Google Cloud's network went down, and when it was looked into, it was found that sharks were nibbling on the underwater optic cable that powers their network.
There are many things that could cause a network to go down. Because of this, whenever we build distributed systems, we should always assume that the network is not reliable.
Latency is Zero
When we build systems with just one computer, we assume there is no latency because reading from RAM, HDD, or SSD takes almost no time.
But when it comes to distributed systems, networks do have latency because the machines can be in different places and the signals have to travel from one machine to the next.
There's also a chance that your computers are on different continents, which makes the difference in latency even more obvious.
Always think of latency as a cost that you have to try to avoid, and we should try to keep this cost as low as possible.
Bandwidth is Infinite
It's a mistake to think that you can add more data to a network channel without taking its limits into account.
Let's say you make a website that pulls 5MB of data from your servers every time it loads. As the size of your business grows, your systems will soon reach a limit because of bandwidth limits.
This problem can't be fixed by adding more nodes, so be careful with every bit you send or receive.
The Network is Secure
Every system with many parts needs a network. Without a network, nodes can't talk to each other. But thinking that the network is safe and that no one can get into it is false and often disastrous. If you don't think about security in your system, hackers will be very happy to hear that from you.
Every system needs to take security seriously and keep looking for security vulnerabilities.
Topology doesn't change
We think that once a system is set up and working perfectly, it will continue to work the same way in the future, but nothing could be further from the truth. Any node can break, and any network between nodes can temporarily stop working. The topology of your system can change.
Always ask yourself, "Does my system have one place where it could go wrong?"
If you have a system, ask yourself, "If this part of the system fails, will my whole system fail?" What would happen if this part of my system stopped working?
This problem is so common that tools like Zookeeper and Consul were made to detect topological changes and respond to them.
It can be hard, but building systems that can adapt to changes in their topology makes for a strong system.
There is one administrator
This mistake says that you can't be in charge of everything.
In a single-machine system, there will be one administrator who knows everything and can control every part of the system.
But in a distributed system, as your system grows, it will start relying on systems that you don't control.
For example, if you are building an online store, you will add a payment gateway that will be run by another company. If that goes down for a short time, you can't do much. The payment gateway will have to fix it, and all you can do is wait until their system is back up and running.
Transport cost is Zero
Cloud service providers will charge you based on how much data your node sends and receives. Each bit you send between your nodes costs you money.
As your systems grow, it makes sense to optimise them for these costs.
For example, if you have two systems that talk to each other using JSON, you might want to look into transfer-optimized formats like Protobuf. These transfer-optimized formats can help you save a lot of money and time.
Global Clock Fallacy
In a single machine, any part that reads the date or time at the same point will always get the same date or time.
But because of how clocks work, there can be a difference in time between two or more machines. This is often called a "clock skew."
In distributed systems, you cannot simply assume that time on one machine will be the same as time on other machines.
Top comments (0)