Antonio Villagra De La Cruz

Posted on Oct 6, 2021 • Edited on Jan 5, 2022 • Originally published at antoniovdlc.me

Building a link shortening service

#go #postgres

Time for a nice little side project: let's build a link shortening service!

Actually, more than giving you all the code to achieve it, we will go over the process of spec'ing and architecting one such service, following a given set of constraints.

Product constraints

Links can be long and cumbersome to pass around. Most existing link shortening which are free, usually might be collecting extra data on their usage. Hence, the idea of building a simple, free, open-source, safe, and privacy-conscious link shortening service.

As this is a very self-contained problem, we can cover all its use cases in just two user stories:

As a user, I want to be able to paste a link I want to shorten and be given back a shortened link.
As a user, when opening a shortened link, I want to be redirected to the original link.

Technical specifications

API

Most links are shared over the web. As such, we will be building a web application. That means that we will have a client and a server communicating over some sort of API, so let's start by defining that API.

Users can interact with our application in two ways: either when creating a short link from a given link, or when accessing a shortened link and being redirected to the original link.

Creating a shortened link

To create a shortened link, we will expose an endpoint that implements a POST request. To keep it as RESTful as possible, we implement this endpoint against a /link path.

This endpoint expects a body with a link property being a string, and it returns a corresponding hash (more on that in the implementation!).

POST /link
body:
  link: String
returns:
  hash: String

Retrieving the original link from a hash

The opposite operation of creating a shortened link, is retrieving the original link. Similarly to the above endpoint, we follow RESTful principles and implement a GET request on the path /link.

This endpoint expects the hash as a query parameter, and returns the corresponding link.

GET /link
params:
  hash: String
returns:
  link: String

Data Model

The data model to support this application is fairly straightforward. As we will be using a PostgreSQL database, we will create the following table to persist the needed data:

CREATE TABLE IF NOT EXISTS links (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  hash VARCHAR(8),
  url TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

We won't be storing anything else: simple and private!

Read operations will heavily rely on hash, as such, it might be a good idea to create an index on that column:

CREATE UNIQUE INDEX IF NOT EXISTS hash_idx on links (hash);

Cron jobs

As we want to provide this service for free, we might need to keep within reasonable quotas from cloud providers. Hence, to limit the size of the database, we will automatically delete older links to provide space for newer ones.

To achieve that, we will run a cron job every 24 hours and delete all links older than 30 days.

"Hashing" procedure

Finally onto the "hashing" function. There are a lot of great libraries implementing hashing functions that we could use, but to keep our guarantee to have a link live only for 30 days, instead we will be using a random string generator.

In that sense, we are not so much hashing as just pairing a link with a random string. This also means that, unlike with hashing, the same link will generate different "hashes". This is, again, in line with how we have defined our requirements, but might not be the ideal solution in other scenarios.

The string we will generate will be comprised of all 26 letters of the English alphabet, both lower case and upper case, which mean that a 4-character random string will yield about 7 million unique random strings. That number goes up to about 380 million with a 5-character random string, and up to about 19 billion with a 6-character string. Here, we can strike a balance between the size of the final hash and the possible unique permutations.

(Partial) implementation

We will be implementing this service using as much vanilla Go, PostgreSQL, JavaScript, HMTL, and CSS as possible.

Quick disclaimer before we dive into the implementation. I am not the most experienced Go developper, and as such the way I've decided to structure the code might not be the most idiomatic way to do so, but it was the way that made most sense to me.

Static assets

All static assets (HTML, JavaScript and CSS) live inside the public folder, and are served by our web server directly. This might not be the most scalable approach, but let's not get into premature optimisation.

main.go

func main() {
    ...
    // Static assets
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        http.ServeFile(w, r, "public/index.html")
    })
    http.Handle("/static/",
        http.StripPrefix("/static/", http.FileServer(http.Dir("./public"))),
    )
    ...
}

The code here is quite uneventful, as we serve index.html on the root path, and then setup a file server for the rest of the assets.

API

As we are trying to be as vanilla as possible (i.e. mostly using the standard library), we will be implementing the API in a fairly exotic manner: we will define API routes and methods in functions.

In our case, we really only have one route and 2 methods, so the code is fairly straightforward:

api/api.go

package api

import (
    "net/http"
    "url-shortener/api/handlers"
)

func Link(w http.ResponseWriter, r *http.Request) {
    var status int
    var data string

    switch r.Method {
    case "POST":
        status, data = handlers.CreateHash(r)
    case "GET":
        status, data = handlers.GetLink(r)
    default:
        status = http.StatusNotImplemented
        data = ""
    }

    w.WriteHeader(status)

    if data != "" {
        w.Header().Set("Content-Type", "application/json")
        w.Write([]byte(data))
    }
}

Handlers are also functions that return (int, string), which correspond to the status code and any JSON string we'd like to send back to the client.

Here is for example GetLink():

api/handler/get_link.go

package handlers

import (
    "log"
    "net/http"
    "url-shortener/db"
)

func GetLink(r *http.Request) (int, string) {
    hash := r.URL.Query().Get("hash")
    if hash == "" {
        return http.StatusBadRequest, ""
    }

    log.Printf("Redirecting from hash: '%s'", hash)

    link, err := db.SelectLink(hash)
    if err != nil {
        log.Printf("Link not found for hash: '%s'. Error: %v\n", hash, err)
        return http.StatusNotFound, ""
    }

    return http.StatusOK, `{"link": "` + link + `"}`
}

Finally, we register the main API handler function in main.go:

func main() {
    // API
    http.HandleFunc("/api/link", api.Link)
    ...
}

Data layer

To continue with our constraint of mostly using the standard library, we will be writing all our database queries in .sql files, which we then need to read and pass to a PostgreSQL driver (in this case pgx).

Because we want our service to be able to run from slate environments, we need to surface a way to create the needed data model. We also want to read our SQL files into variables once and then just handle strings throughout the lifetime of the application.

db/db.go

package db

import (
    "context"
    "io/ioutil"
    "log"
    "os"

    "github.com/jackc/pgx/v4/pgxpool"
)

func Init() {
    // Check needed tables are created
    ...

    // Create needed tables
    ...

    // Read other SQL files to be used in API handlers
    initQuerries()
}

The file that does most of the heavy lifting in terms of providing an interface to the handlers is dbquerries.go where we read the SQL files and surface functions for the defined database operations:

db/dbquerries.go

package db

import (
    "context"
    "io/ioutil"
    "log"
)

var insertLinkSQL []byte
var selectLinkSQL []byte

func initQuerries() {
    var err error

    insertLinkSQL, err = ioutil.ReadFile("db/sql/insert-link.sql")
    if err != nil {
        log.Fatalf("Error while reading file 'db/sql/insert-link.sql': %v\n", err)
    }

    selectLinkSQL, err = ioutil.ReadFile("db/sql/select-link.sql")
    if err != nil {
        log.Fatalf("Error while reading file 'db/sql/select-link.sql': %v\n", err)
    }
}

func InsertLink(hash string, link string) error {
    dbpool, err := getPool()
    if err != nil {
        log.Printf("Unable to connect to database: %v\n", err)
        return err
    }
    defer dbpool.Close()

    _, err = dbpool.Exec(context.Background(), string(insertLinkSQL), hash, link)

    return err
}

func SelectLink(hash string) (string, error) {
    dbpool, err := getPool()
    if err != nil {
        log.Printf("Unable to connect to database: %v\n", err)
        return "", err
    }
    defer dbpool.Close()

    var link string
    err = dbpool.QueryRow(context.Background(), string(selectLinkSQL), hash).Scan(&link)

    return link, err
}

And here are the corresponding SQL files:

db/sql/insert-link.sql

INSERT INTO links (
  hash,
  url
) VALUES ($1, $2);

db/sql/select-link.sql

SELECT url 
FROM links 
WHERE hash = $1;

Cron job

Probably the least interesting for last, we again look at simplicity over scalability here, and with simply define tickers inside our jobs (or just one for now really!).

cron/cron.go

package cron

func Init() {
    AutoDeleteLinksJob()
}

The Init() function is called in main.go when starting the server, and it contains all the existing jobs, in this case just the following one:

cron/auto-cleanup.go

package cron

import (
    "time"
    "url-shortener/db"
)

func AutoDeleteLinksJob() {
    db.DeleteOldLinks()

    ticker := time.NewTicker(24 * time.Hour)
    go func() {
        for range ticker.C {
            db.DeleteOldLinks()
        }
    }()
}

As discussed previously, here we simply use a ticker to run a database query every 24 hours. That query simply deletes all links older than 30 days.

delete-links-old.sql

DELETE 
FROM links 
WHERE created_at < current_timestamp - interval '30 days';

There you have it! Probably not the most solid implementation, but it should do the job!

I hope this was either useful, or entertaining, or both! There is still a lot of room for improvement both in terms of product and of code, but hopefully this gives a good overview of what such a project can look like.

You can actually have a closer look at all the code base here: