Hello & 👋 welcome to the System Design series by @mukeshkuiry!
In our previous post, we delved into the basics of distributed system design. Now, let's take a deeper dive into the fundamentals of distributed systems—a crucial topic to grasp before jumping into system design.
MapReduce, introduced by Google's Jeffery Dean and Sanjay Ghemawat in 2004, has emerged as a pivotal distributed system framework. Specifically designed to handle substantial volumes of data, MapReduce leverages multiple servers for effective data management and computation. At its core, MapReduce provides an abstraction layer, allowing developers to concentrate on high-level logic while the framework takes care of intricate details such as coordination, parallelization, fault tolerance, and load balancing.
Partitioning: To handle large data chunks efficiently, MapReduce employs a process called partitioning, breaking down extensive datasets into smaller, more manageable pieces known as input splits. These input splits are then processed in parallel by map tasks.
Mapping: The mapping phase involves executing computations on each input split, resulting in the generation of key-value pairs.
Intermediate file: The data is partitioned into R partitions (with R representing the number of reduce workers), stored temporarily in the buffer Accumulator until it's transmitted to reduce workers by the primary node.
Reduce: Workers sort and group the data based on common keys.
Aggregate: Once grouping is complete, the data is aggregated, and R output files are generated for end-users.
MapReduce, with its streamlined workflow, proves to be a robust solution for efficiently handling large datasets, making it a cornerstone in the realm of distributed computing.
In the diverse landscape of system architecture, two key players come into play—Stateless and Stateful Systems.
Stateless: In this architecture, each transaction stands independently, devoid of any storage or reference to preceding transactions. Requests between sender and receiver complete autonomously, without relying on the context of prior transactions.
Stateful: This architecture, in contrast, enables the storage, recording, and retrieval of established information and processes over the internet. Stateful systems allow for the persistence of data across multiple transactions, enhancing continuity and context.
A nuanced understanding of these architectural paradigms becomes imperative when designing systems, as it directly impacts the system's behavior and performance.
Enter Raft, a consensus algorithm conceived as a more comprehensible alternative to Paxos. Beyond its user-friendly design, Raft is formally proven safe and offers additional features. This algorithm provides a generic approach to distributing a state machine across a cluster of computing systems, ensuring agreement on the same series of state transitions.
Dive deep into the captivating world of distributed systems, armed with insights into MapReduce, Stateless and Stateful Systems, and the Raft Algorithm. Stay tuned for more captivating explorations! 🚀