DEV Community

Binoy Vijayan
Binoy Vijayan

Posted on • Edited on

Simplifying Links: A Deep Dive into URL Shortener System Architecture

This article thoroughly examines the concept and implementation of a URL Shortener service, elucidating both its functional and non-functional requirements, offering insightful estimates for resource allocation, and dissecting the key components necessary for designing an effective system. Here, the pros and cons of URL shorteners are not discussed.

Let’s discuss more about the system.

Functional Requirements.

Short URL generation: The service should be able to generate a unique shorter alias of the given URL.

Eg.

Shortened URL - https://tcrn.ch/3yJH0vv

Original - https://techcrunch.com/2021/12/01/bitly-makes-first-acquisition-with-qr-code-leader-egoditor/

Redirection: Given a short link, the system should be able to redirect the user to the original URL.

Custom short links: Users should be able to generate custom short links for their URLs using the system.

Deletion: Users should be able to delete a short link generated by the system, given the rights.

Update: Users should be able to update the long URL associated with the short link, given the proper rights.

Expiry time: There must be a default expiration time for the short links, but users should be able to set the expiration time based on their requirements.

Non-Functional Requirements.

Availability: The system should be highly available, because even a fraction of the second downtime would result in URL redirection failures. Since the system’s domain is in URLs, we don’t have the leverage of downtime, and the design must have fault-tolerance conditions instilled in it.

Scalability: The system should be horizontally scalable with increasing demand.

Readability: The short links generated by the system should be easily readable, distinguishable, and tapeable.

Latency: The system should perform at low latency to provide the user with a smooth experience.

Unpredictability: From a security standpoint, the short links generated by the system should be highly unpredictable. This ensures that the next-in-line short URL is not serially produced, eliminating the possibility of someone guessing all the short URLs that the system has ever produced or will produce.

Back of the envelope calculation

It's advisable to establish realistic estimations initially, as they may need adjustment in the future due to design modifications. Let's formulate some assumptions to finalise the estimation

Assumptions

  • Shortening : Redirection request ratio is 1 : 100

  • 200 million new URL shortening requests per month

    A URL shortening entry requires 500 Bytes of database storage.

  • Each entry will have a maximum of 10 years of expiry time, unless explicitly deleted.

  • There are 100 million Daily Active Users (DAU).

Storage Estimation

Since the entries are saved for 10 years and there are a total of 200 million entries per month, the total entries will be approximately 24 Billion.

10 X 12 X 200 —> 2400 Million —> 24 Billion

Since each entry is 500 Bytes, the total storage estimate would be 12 TB:

24 Billion X 500 Bytes = 12 TB

Query rate estimation

Based on the storage estimation we can expect 20 billion redirection requests per month.
200 Million X 100 = 20 Billion
Extend the calculations for Queries Per Second (QPS) for the system from this baseline. The number of seconds in one month, given the average number of days per month is 30.42(365 / 12)

_30.42 Days X 24 Hours X 60 Minutes X 60 Seconds = 2628288 seconds
_

20 Billion / 2628288 seconds = 76 URL / Seconds

With a 1: 100 shortening to redirecting ratio, the URL redirection rate per second will be:

76 X 100 = 7.6K URL / Second

Bandwidth estimation

Shortening requests: The expected arrival rate will be 76 new URLs per second. The total incoming data would be 304 kilobits per second

76 X 500 X 8 bits = 304000 —> 304 Kbps

Redirection requests: Since the expected rate would be 7.6K URLs redirections per second, the total outgoing data would be 30.4 Mbps

7600 X 500 X 8 bits = 30400000 —> 30.4 Mbps

Memory(Cache) Estimation

We need memory estimates in case we want to cache some of the frequently accessed URL redirection requests. Let’s assume a split of 80-20 in the incoming requests. 20 percent of redirection requests generate 80 percent of the traffic.
Since the redirection requests per second are 7.6 K, the total would be 0.66 billion for one day.

7.6 K X 3600 seconds X 24 hours = 0.66 billion

Since we would only consider caching 20 percent of these per-day redirection requests, the total memory requirements estimate would be 66 GB.

0.2 X 0.66 Billion X 500 Bytes = 66 GB

High Level Design

Image description

Key Components

Let’s discuss about the key components included in the system

Load Balancer -1

This is the load balancer residing before the web-servers which will balance the load to the web-servers

Load Balancer - 2

This is the load balancer residing before the application-servers which will balance the load to the web-servers

Web-Server

In the context of a URL shortening system, a web server serves as the interface between clients (such as web browsers or API clients) and the URL shortening system.

Rate Limiter

Its purpose is to prevent an overload of requests to the web-server, which could lead to performance degradation or even system failure.

Application Server

The application server provides support for managing services such as a URL Shortening Service, Counter/Sequencer Service, User Service, URL CURD Service, and URL Redirection Service.

Zoo Keeper

The Zookeeper would be responsible for allocating specific ranges of sequence numbers to different instances of counter/sequencer in the URL shortening service. This ensures that each component can generate unique short URLs without conflicting with others.

This approach helps in distributing the workload among multiple instances of the counter/sequencer, improving scalability and performance. Additionally, by utilising Zookeeper for this purpose, the system can benefit from its coordination and synchronisation capabilities, ensuring that the allocation process is managed reliably and consistently across the distributed environment.

Cache

Once a URL is shortened and its corresponding short URL is generated, the system can cache the mapping between the original URL and the short URL in memory or a distributed cache such as Redis or Memcached. This allows subsequent requests for the same original URL to be quickly redirected without hitting the database or performing additional processing.

URL Shortening Service

Generating Unique Sequence Number(Counter/Sequencer): The first step involves generating a unique sequence number for each long URL that needs to be shortened. This sequence number is obtained based on the range allocated to the sequencer from ZooKeeper. ZooKeeper manages the allocation and distribution of sequence numbers to ensure uniqueness and consistency across the system.

Encoding with Base62(Base 62 Encoder): Once a unique sequence number is obtained for a long URL, it is then encoded using a Base62 encoder. Base62 encoding converts the numerical sequence number into a short alphanumeric string, which represents the shortened URL. This encoded string is typically shorter and more user-friendly than the original numerical sequence number.

User Service

User Authentication: The User Service would handle the authentication of users who want to access the URL shortening system. This could involve verifying user credentials (such as username and password) and issuing authentication tokens (e.g., JSON Web Tokens) upon successful authentication.

Authorisation: Once authenticated, the User Service would determine what actions a user is authorised to perform within the system. For example, only authenticated users might be allowed to create and manage their shortened URLs, while anonymous users may only be permitted to access existing shortened URLs.

URL CURD Service

Create: The URL CURD Service would handle the creation of shortened URLs. When a user submits a long URL to be shortened, this service generates a unique short URL and stores the mapping between the short and long URLs. It ensures that the generated short URL is unique and not already in use.

Retrieve: Users may need to retrieve information about existing shortened URLs, such as their original long URLs, creation timestamps, or usage statistics. The URL CURD Service provides endpoints or methods to retrieve this information based on the short URLs provided.

Update: In some cases, users might need to update the destination of an existing shortened URL. For example, they may want to redirect the shortened URL to a different long URL. The URL CURD Service allows authorised users to update the destination of shortened URLs while ensuring that the new URL is valid and unique.

Delete: Users may also need the ability to delete shortened URLs that are no longer needed or are outdated. The URL CURD Service provides functionality to delete existing shortened URLs from the system, freeing up resources and ensuring that the URLs are no longer accessible.

URL Redirection Service

URL Redirection Service is responsible for redirecting users from a shortened URL to its corresponding original long URL

Receive Request: The URL Redirection Service receives an incoming request for a shortened URL. This request contains the shortened URL provided by the user or embedded in a link.

Lookup Original URL: The service looks up the original long URL associated with the provided shortened URL. This lookup is typically done using a database or cache where the mappings between shortened and original URLs are stored.

Redirect: Once the original long URL is retrieved, the service responds to the request by issuing an HTTP redirect response (usually with a status code 301 or 302) to redirect the user's browser to the original URL.

Database

In a URL shortening system, a database plays a crucial role in storing and managing various data related to shortened URLs, user accounts (if applicable), analytics, and system configuration.

URL Mapping: The database stores the mapping between the original long URLs and their corresponding shortened URLs. This mapping allows the system to quickly retrieve the original URL when a shortened URL is accessed and vice versa.

User Accounts: If the URL shortening system includes user accounts, the database stores user account information such as usernames, passwords (hashed and salted for security), email addresses, and any additional profile information.

Summary

The article delves into the intricacies of designing a URL shortening system, outlining crucial components and considerations. It emphasises the necessity of a user-friendly interface for inputting long URLs and receiving shortened versions, alongside the implementation of a robust algorithm to generate unique and short codes. Database management plays a pivotal role, necessitating the storage of mappings between shortened codes and original URLs, often utilising NoSQL databases or key-value stores for scalability. Redirection mechanisms are crucial for seamlessly mapping shortened codes back to their original counterparts, while scalability concerns prompt the adoption of load balancing, horizontal scaling, and caching techniques. Security measures, including rate limiting and input validation, are essential for thwarting abuse, while customisation options and analytics provide added functionality and insights. Through meticulous attention to these aspects, a comprehensive URL shortening system can be developed, capable of efficiently managing URLs while ensuring scalability, security, and user satisfaction.

Top comments (0)