Recently I came across a Youtube video called: System Design : Design a service like TinyUrl, from the channel Tushar Roy - Coding Made Simple. This video discusses a common developer interview question, namely, how do you design a service like TinyURL, which allows users to turn long URLs into short ones that are just several characters long.
Basically, a TinyURL-like service would have 2 main APIs:
getLong(shortUrl). The second one is easy, you simply need to do a lookup and return the long URL (or 404 if none exists). The main problem is the
createShort() API: How do you generate a short sequence of characters that is unique among URLs (note that uniqueness is an important property, we don't want different URLs to have the same shortcut).
Tushar's proposed solutions are quite good and I think most interviewers would be satisfied with them (please watch the video before continuing to read this post). That being said, they are sort of unsatisfying. To summarize, the most sophisticated solution proposed in the video is to partition all possible short sequences into ranges, and use a set of servers to return a monotonically increasing sequence, which falls within a range. Each server would be in assigned only one particular range to work with, and Apache Zookeeper is used to coordinate the sequence range assignments.
If the each server has a unique range, then they are guaranteed to generate unique sequences.
The reason I think this answer is unsatisfying is because, while it works, it simply shifts the responsibility of generating the "unique" part of the sequence, which, is the hardest part of the problem, to Zookeeper. Instead of answering the question "how to generate a unique sequence?" (or sequence range, in this case), this solution simply says "I'll just ask Zookeeper to give me one". But how does Zookeeper do that?
First of all, why is it so hard to generate a unique sequence? Afterall, I can use a single computer to keep increasing a counter, and that would be unique, right? In fact, that solution is mentioned by Tushar in the video, but later rejected, because the counter-generating server might fail (either the machine itself crashes, or the network might go down etc.), and Zookeeper, somehow, magically provides "high availability" (i.e. it is resilient to failures).
And that's the gist of the problem. If I had the guarantee that my servers never fails, then I wouldn't need Zookeeper. I probably wouldn't need multiple servers either, one beefy machine might be enough to do the job. Unfortunately, in practice machines do fail, and in fact, they fail all the time. That is why when we design systems, we design for failure. In this case, when one servers in the Zookeeper cluster fails, somehow the system needs to make sure that the others don't return a duplicate range. The only way to do that is to make all servers agree on which ranges have been given, and which have not.
So let's try to simplify & generalize the problem: given a set of servers, how do we ensure that all servers agree on a value, even if the servers might fail randomly (the value in this case would be the range assignment). This is know as the distributed consensus problem, which actually is one of the hardest problems in Distributed systems. In fact, it has been mathematically proven that, in an asynchronous system (meaning a system where we don't know how long it takes for messages to travel between servers), there is NO way to guarantee distributed consensus. This is known as the FLP Impossibility.
Fortunately, in most of the systems in practice, we can workaround this issue by modelling them as "partially synchronous systems", that is, we can apply a boundary on how long it takes to send messages between servers. And in this model, consensus is possible. There are several algorithms that can be used to get consensus, like Paxos or Raft. Zookeeper itself uses a consensus protocol called Zab (which stands for Zookeeper atomic broadcast).
I won't get into details on how these algorithms work. Afterall, they are quite complicated and sometimes difficult to understand. However if you ever need to work with those directly, an important thing to pay attention to is that they are not perfect. Raft and Paxos, for example, only works if the number of failed nodes is less than half the total number of nodes in the system. Failure also take different forms, and while Paxos and Raft works well with Fail-stop and Fail-safe types of failure, Byzantine-type failures are a lot harder to deal with.