DEV Community

Sidharth
Sidharth

Posted on • Originally published at newsletter.scalablethread.com

Mistaken Beliefs about Distributed Systems

When designing large-scale software applications, it's common to base assumptions or beliefs about the components of a distributed system on personal experiences from using modern-day software applications. However, these basic assumptions can become significant challenges when the system encounters production traffic.

Fallacies of Distributed Systems

1) The Network is Reliable

By nature, a distributed system comprises multiple nodes, systems, or services connected and communicating over a network. Any change or fault in the network can disrupt communication and hinder the system's functionality. These faults can result from factors such as subsea cable cuts, switch failures, or power outages. Additionally, most hardware responsible for network reliability operates with some form of software or firmware. Any subtle bugs or failures in this software can also cause network issues. This leads us to the conclusion that the network is NOT reliable.

Fig. Network is NOT reliable


Fig. Network is NOT reliable

2) Latency is Zero

In a distributed system, data centers or servers may be located in the same region or opposite parts of the world. While this geographic replication enhances the resilience of the service, it also increases the time it takes for communication or data to travel between machines. Additionally, the inherent unreliability of networks can lead to packet delays, further extending the time required to complete client instructions. Despite modern networks being quite fast, with global data fetch round-trip latencies typically measured in milliseconds, the notion that latency is zero is incorrect.

Fig. Latency is NOT zero


Fig. Latency is NOT zero

3) Bandwidth is Infinite

Bandwidth measures the amount of data that can be transmitted over a network per unit of time. While modern large-scale applications may create the impression of unlimited bandwidth, there are instances where data demands, like those from online streaming or gaming, can exceed the available capacity, leading to throttling. Moreover, when sharing infrastructure resources such as message brokers among multiple services, bandwidth per producer is constrained to prevent high-throughput producers from impacting other services.

4) The Network is Secure

The first lesson when designing web forms that take user input is: never to trust user input. Similarly, the advice for connecting to open Wi-Fi networks is: never connect to open Wi-Fi networks. Furthermore, when designing communication among different services over a network, it is crucial to authenticate the source of requests. This is because the internet is full of unfriendly users with malicious intent—those who sniff network traffic to decode communication, or attempt to attack firewalls to access private networks, services, or databases. This demonstrates that the network is NOT secure by nature.

Fig. Network is NOT secure


Fig. Network is NOT secure

5) Topology doesn’t Change

Network topology refers to the arrangement of different nodes in a network. Although it might not be evident to the average internet user, this arrangement changes regularly due to events such as node or data center failures, network partitions, high traffic loads on a single resource, and the addition of new services, nodes, or data centers. These changes result in the re-routing of client requests, which can increase or decrease latency. Such frequent changes in topology can lead to scenarios where the caller times out, fails while waiting for a response, or is unable to connect to the correct callee.

Fig. Topology Changes


Fig. Topology Changes

6) There is One administrator

In modern computing, with easy access to cloud computing tools and the increasing complexity of online software applications, it is practically infeasible for a single person to manage everything from design to development to deployment. Most internet applications consist of microservices, each owned by different teams with their own development and deployment tools and cycles. Consequently, it is impractical to have one administrator responsible for everything.

Fig. There are multiple administrators


Fig. There are multiple administrators

7) Transport Cost is Zero

To enable communication between different entities in a distributed system, data must be transported over the network. Similar to moving physical goods in the real world, transporting data online also requires resources such as physical hardware, network bandwidth, software, and power. Although these non-zero transport costs may seem minor initially, they ultimately comprise a significant portion of the overall expenses of operating an online large-scale application.

8) The Network is Homogeneous

Different services or nodes deployed on the internet often have varying configurations in terms of hardware specifications or operational parameters. For example, every device connected to the internet has distinct hardware specifications. Designing a system assuming all devices have homogeneous configurations will lead to performance or compatibility issues. This principle also applies to large-scale distributed systems deployed across multiple geolocated data centers, where the physical machines have heterogeneous configurations.

Fig. Network is NOT Homogeneous


Fig. Network is NOT Homogeneous


This article was originally published in The Scalable Thread newsletter. If you enjoyed reading it, please consider subscribing FREE to the newsletter to support my work and to receive the next post directly in your inbox.

Top comments (0)