Open Source Software (OSS) has been the main driving force in democratizing access to so many awesome tools with way more transparency than ever possible. It’s never too late to start giving back to the community and contribute towards a better OSS culture. That’s why we started this journey by open-sourcing our in-house URL Shortener service. The reason for choosing this is to assess the road ahead and be in a better position to embark on our open source journey.
Let’s take a look at the steps involved in open sourcing this service.
Being an internal service, the URL Shortener was strongly tied with our tracking API which is used for, as the name suggests, tracking purposes. We needed to decouple these services before open-sourcing URL Shortener as having internal dependencies in an open-source project is unfeasible for obvious reasons. This called for refactoring.
As shown in the diagram, CTA (Call To Action) token is generated in the notifications service and is passed down to the URL shortener whenever a notification needs to be sent. URL Shortener then stores the CTA token <> original URL mapping in a separate table. And, it simply passes the CTA token and the original URL to the tracking API whenever someone clicks on the short link. As you might guess, the CTA token has nothing to do with a URL shortening service and therefore it should not have any context of such tokens.
This presented before us, the quest to pull URL Shortener out of the loop and stop passing any redundant data to it. Let’s take a look at the steps involved:
As you can see, the above HLD proposes a different way of passing down the CTA Token in a way where URL Shortener is not bothered with unnecessary data. Let’s go through it step-by-step:
Step 1: Notification services hit the URL Shortener to get the short URL whenever a notification needs a short link (eg. referral emails, market open reminder SMS, order updates SMS, etc.).
Step 2: URL Shortener generates the short URL, maps it with the corresponding original (or “long”) URL, and returns the result to the notification service.
Step 3: After sending the notification, the shortener forwards the CTA token & the corresponding original URL to our tracking API service when someone clicks on the short URL.
Step 4: Tracking API stores this original URL to CTA token mapping in the PostgreSQL.
Forwarding the data to tracking API happens like this:
Step 1: User clicks on the short link received through email or SMS notification.
Step 2: URL Shortener receives the short URL user clicked and redirects him/her to the original URL.
Step 3: Before redirecting the user to the original URL, the shortener also emits a kafka event that contains the short URL that the user clicked.
Step 4: Tracking API, upon receiving the short URL, stores the data corresponding to that in the PostgreSQL table for analytics purposes. No user-specific data is used in any shape & form. We only use the metadata to understand the delivery, click rates, etc.
And that is it for the business logic abstraction. Following these steps, we were able to make the URL Shortener loosely coupled with our tracking API and free from internal dependencies.
The URL Shortener service was created a little more than 2 years ago to fulfil the needs of an internal URL shortening service. It was just a bare-bone HTTP server with SQLite as the database. But with the increase in the notification sent from smallcase, the number of requests to the shortener has increased significantly over time, it gets around 500k (read + write) requests per month. There were a couple of things that need addressal:
- Simple & non-scalable nature of the service
- No logging pipeline to debug when something goes wrong.
- No way to avoid getting duplicate short keys for different URLs.
- No purging of stale entries from the database.
As I mentioned, the initial setup was not reliable enough for the growing needs. There were some major changes required in the implementation. There were three options we had in mind:
- Using S3, AWS Lambda, and CloudFront
- Using AWS API Gateway and Dynamo DB
- Fastify with MongoDB & Redis
Let’s talk about each one of them.
This approach aims to use S3 as a redirection engine by activating website hosting on the bucket. This way, for each short URL we can create a new empty object with a long URL attached in the website redirect metadata. On top of this, we can create a bucket lifecycle policy to purge the entries older than a set timeframe.
To create an admin page, all we need is a basic page hosted on S3 which will trigger a POST request to API Gateway invoking a lambda function which will:
- create a short key
- create an empty S3 object
- store the short URL (/) as the redirection destination in the object properties.
While going ahead with this approach meant we didn’t have to worry about scalability or High Availability, it certainly ties us with AWS offerings and implicitly denies any flexibility when it comes to change of service vendor.
If we observe closely, all that lambda function is doing is storing the short URL in the empty S3 object. Hence, we can cut out on the resources & cost using this approach. Let’s take a look at how API Gateway combined with Dynamo DB would work here.
There are four phases a request goes through when using the API Gateway:
- Method Request
- Integration Request
- Integration Response
- Method Response
Method Request involves creating API method resources, attaching HTTP verbs, authorisation, validation, etc.
Integration Request is responsible for setting up the integration between API Gateway & DynamoDB. One thing to note here, we need to modify the request & change it to a format that DynamoDB understands.
Method Response is configured to send the response back to the client which can be 200, 400, or some other.
Integration Response is what we get from DynamoDB but again, we need to convert this back into the format that the client understands.
Again, while this approach allows us to get rid of the lambda and uses Apache VTL to communicate with DynamoDB, this presents the vendor lock-in we saw in the previous approach as it is strongly tied to AWS offerings. Also, it leaves us with us zero-control over the execution.
It is immediately noticeable that this approach gives us complete control over the service with no vendor lock-in. We can choose any data storage solution as per our needs, custom logging setup, and even in-house key generation service if we want.
While this is perfect in terms of what we wanted, it also means we now have to make sure that MongoDB and Redis are highly available otherwise it directly affects our service. This means developers’ bandwidth is extensively required which was not the case in the previous approaches.
With our Fastify application in place, we were able to plug our improved custom logging pipeline which is a huge benefit to the developer experience because the old pipeline was not reliable for the scale we now operate at.
With the increasing number of requests and possibly errors, we needed a proper logging setup to debug and monitor the service. That’s why we chose bunyan to log insightful data in our application. These logs sit conveniently on our new logging pipeline running on EFK (or, Elasticsearch Fluentd Kibana) stack. While this deserves a separate blog post on its own, let’s take a brief look at how the logs travel from our application to the kibana dashboard.
The logs that we have written inside the application are produced to the standard output. The fluentd collector (which is present in all the applications using the EFK logging pipeline) takes all the logs from the stdout and forwards them to the fluentd aggregator.
These potentially transformed logs are then sent to the Elasticsearch nodes over the network where this data gets stored. The structure of the logs needs to follow a predetermined pattern and that’s why Elasticsearch needs an index mapping to understand the structure of logs comings its way. This helps in indexing and storing data.
Kibana uses the structured logs data to show the logs nicely on a dashboard. Since the data is structured, Kibana enables us to create visualisations and custom dashboards (a collection of different visualisations) on top of it.
With the increasing number of short key generations, there’s a higher probability that the key generation service can spit out the same short keys for two different original (or long) URLs, if not handled correctly. The solution to this problem is simply not let a short key get reused. Now there are two ways to achieve this:
- Do not generate a duplicate short key
- Retry until a unique short key is generated
Let’s take a look at both approaches.
To make sure we don’t generate a duplicate short key ever, we need to know what keys have been generated already. One approach could be creating two tables in PostgreSQL, one for the available keys (let’s say AvlK) and one for the keys that are occupied (let’s say OccK).
So while creating a short URL, we would fetch one unused key from AvlK table, add it to OccK table and return it. Two database operations, one short URL. Not fair.
Instead of maintaining two tables just to get one short key, we can work with just one PostgreSQL table which will store the keys already occupied. We can then simply generate a random key, check if it is occupied, and assign it if it is not.
Looking at the results on nanoId collision calculator, we can see that after three days of generating short keys at rate of 70/hr, there is 1% probability of encountering at least one collision.
70 keys generation per hour * 24 hours * 3 days = 5040 short keys
So 1% probability means, having at least one collision in every 5k short keys generation.
Short URLs are not supposed to have a lifetime of decades or even more than 1 year depending upon the use case. As it is not practical to store the entries forever. That’s why purging is required. But the implementation can be flexible. At smallcase, short URLs are majorly generated for two broad categories:
- For transactional notifications
- For campaigns
The short links generated for the transactional notifications are not supposed to be active forever whereas the links that are generated for the campaigns are supposed to be active till the campaign is active. Considering the differences in the lifespan of different short links, they needed to be treated differently when it comes to purging the entries from the database.
One approach was to run a job that would remove all the entries which are older than a set timeframe. But turns out there was a better way with minimal additional effort. Instead of running a dedicated job to purge entries, we could simply handle this when we’re creating short URLs. Remember we were doing retries to land upon a key that was not already occupied? A minor change in that process handled purging for us. Just when you get an already occupied key, allow overwriting to that key only if it has hit the expiration date (which is also stored during the creation of the short key along with the mapping). This increases the time of creating short links comparatively but this is the trade-off you need to make to ensure unique keys.
Lastly, the crucial part of an open-source project. Documentation. These were the following things that were on the checklist:
- Templates for creating issues & submitting PRs for streamlined flow.
And finally, making the project public! 🎉
This was our journey to open-sourcing the URL shortener service that we use at smallcase. I believe open source not only helps in building a better tool, but it also builds a community of people that care about equal access to software. At the end of the day, we learn from each other.
The project is available here: github.com/smallcase/smalllinks
Please feel free to create an issue on Github if you find any improvement opportunities or bugs present in the project. I’ll be happy to connect 😃.