DEV Community

Cover image for Auto-magically cleaning up shortened URLs with Go
tux0r
tux0r

Posted on

Auto-magically cleaning up shortened URLs with Go

One of the annoyances in today’s web - no, this time I won’t rant about JavaScript, I promise! - is that it is full of shorteners, redirections, tracking parameters like utm_* et cetera, making it relatively hard to share URLs in a non-shady way.

If you stay in your web browser, this is not a problem anymore: The arrival of various browser extensions which remove most, if not all, of the redirectors solves this issue for you. Still, what is not covered is your system clipboard. Sending your conversational partner a link like

https://bit.ly/3hXl0mS

will probably be a less inviting option than sending this one:

https://dev.to/tux0r/properly-validating-e-mail-addresses-3lpj

Here’s a market niche to fill. And it is relatively easy to do so! Let’s use Go because why not?

Our application consists of two parts: One that watches the clipboard (and can be stopped whenever the user wants to), one that processes an URL and un-shortens and un-tracks it. The second one is more complicated.

First step: The cleaner functions.

There are two kinds of shortened URLs: Actual redirectors like bit.ly and t.co and pseudo-redirectors which belong to companies trying to link to themselves, e.g. most large media corporations. Our application should detect both. Luckily, Go comes with a wide array of network capabilities, so writing a function to determine an URL’s actual target is possible without any external packages:

func ExpandUrl(url string) (string, error) {
    // URL expander with x509 checks disabled.
    expandedUrl := url

    tr := &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    }

    client := &http.Client{
        CheckRedirect: func(req *http.Request, via []*http.Request) error {
            expandedUrl = req.URL.String()
            return nil
        },
        Transport: tr,
    }

    req, err := http.NewRequest(http.MethodGet, url, nil)
    if err != nil {
        return "", err
    }

    resp, err := client.Do(req)
    if err != nil {
        return "", err
    }

    defer resp.Body.Close()

    return expandedUrl, nil
}

This will return the actual URL or an error if anything went wrong. We’ll skip the certificate checks because the validity of the sites themselves is irrelevant for what we want to achieve.

Of course, we can also keep a list of known shorteners. We don’t need to actively call a server if we already know that the URL is a redirection.

var shortenerList = []string{
    "bit.ly", "buff.ly", "dlvr.it",
    "goo.gl", "youtu.be", "tinyurl.com",
    "ow.ly", "amzn.to", "ift.tt", "zpr.io",
    "apple.co", "mol.im", "redd.it",
    "shar.es", "is.gd", "dld.bz",
    "trib.al", "fb.me", "tumblr.co",
    "cutt.ly", "app.link", "twib.in",
    "kko.to", "rsci.app.link", "upflow.co",
    "snip.ly", "lnk.to", "1jux.net", "gscoff.co",
}

Note that this list is incomplete, but you probably can see how to expand it yourself. Comments inside the list are allowed, by the way, so feel free to add some structure as well.

Talking about lists, we also want to remove tracking parameters. For this, our application needs to know which parameters are unwanted. Here’s mine as of today:

var urlParamBlacklist = []string{
    "wtmc", "WT.mc_id", "wt_zmc",

    "ocid", "xid",

    "at_medium", "at_campaign", "at_custom1", "at_custom2",
    "at_custom3", "at_custom4",

    "utm_source", "utm_medium", "utm_campaign", "utm_term",
    "utm_content", "utm_name", "utm_referrer", "utm_brand",
    "utm_social-type", "utm_kxconfid",

    "guce_referrer", "guce_referrer_sig", "guccounter",

    "ga_source", "ga_medium", "ga_term", "ga_content",
    "ga_campaign", "ga_place",

    "fb_action_ids", "fb_action_types", "fb_source", "fb_ref",
    "fbclid", "fbc",

    "hmb_campaign", "hmb_medium", "hmb_source",

    "newsticker", "CMP", "feature", "camp", "cid", "source",
    "ns_campaign", "ns_mchannel", "ito", "xg_source", "__tn__",
    "__twitter_impression", "share", "ncid", "rnd", "feed_id",
    "_unique_id", "GEPC", "pt", "xtor", "wtrid", "medium", "sara_ecid",
    "from", "inApp", "ssm", "campaign", "mbid", "s_campaign", "rss_id",
    "cmpid", "s_cid", "mv2", "scid", "sdid", "s_iid", "ssm",
    "spi_ref", "referrerlane",

    "share_bandit_exp", "share_bandit_var",

    "igshid", "idpartenaire",

    "aff_code", "affID",

    "recruited_by_id", "recruiter",
}

Now all we need is a wrapper function which takes an URL as a parameter and returns the unshortened, untracked one:

func contains(arr []string, str string) bool {
    // Array-containing check: Returns true if found.
    // I wish Go could do that natively. :-)
    for _, a := range arr {
        if a == str {
            return true
        }
    }
    return false
}

func processUrlItem(urlToCheck string) string {
    // Processes an URL item, returns the cleaned, unshortened
    // "expanded" URL.
    u, err := url.Parse(urlToCheck)
    if err != nil {
        log.Fatal(err)
    }

    // Some URL shorteners are not known to us (yet?).
    // Chances are that URLs with a path that ends in
    // "/SoMeSTRiNG123" are shortened. Catch them as well.
    re, _ := regexp.Compile("^/[A-Za-z0-9_-]{5,}$")
    potentialUrlShortener := re.MatchString(u.Path)

    if potentialUrlShortener || contains(shortenerList, u.Hostname()) {
        expandedUrl, err := ExpandUrl(urlToCheck)
        if err != nil {
            // Cannot reach the URL:
            return urlToCheck
        }

        // Overwrite the original URL by the expanded one:
        urlToCheck = expandedUrl

        // Parse again, just in case:
        u, err = url.Parse(urlToCheck)
        if err != nil {
            // Error in the updated domain:
            return urlToCheck
        }
    }

    // Remove tracking parameters:
    q := u.Query()
    for _, param := range urlParamBlacklist {
        q.Del(param)
        u.RawQuery = q.Encode()
    }

    return u.String()
}

Would you like to test whether it works? Write a temporary main() function:

func main() {
    fmt.Printf("%s\n", processUrlItem("https://bit.ly/3hXl0mS"))
}

If nobody has made a horrible mistake, you should see the unshortened link in your terminal now.

Next step: The actual application.

Now that we have a working cleanup functionality, we can wrap it around the clipboard. This is relatively easy as well: We can poll the clipboard once a second, check its contents, pass all found URLs to our cleaner and write the results back into the clipboard. Go’s good ecosystem makes this much easier for us - we can fit the whole calling application into less than 50 lines, including comments and whitespace:

package main

import (
    "fmt"
    "net/url"
    "time"

    "github.com/getlantern/systray"
    "github.com/atotto/clipboard"
)

var (
    previousUrl string   // Avoid parsing it over and over again
)

func main() {
    go func() {
        for x := range time.Tick(time.Second) {
            clipped, err := clipboard.ReadAll()
            if err == nil && clipped != previousUrl {
                u, invalidUrl := url.Parse(clipped)
                if invalidUrl == nil && u.Host != "" {
                    // valid URL
                    fmt.Printf("[%s] Processing URL: %s\n", x, clipped)
                    previousUrl = processUrlItem(clipped)
                    clipboard.WriteAll(previousUrl)
                }   
            }
        }
    }()
    systray.Run(onReady, onExit)
}

func onReady() {
    systray.SetTitle("🧹")
    systray.SetTooltip("I'm cleaning URLs in the clipboard")
    mQuitOrig := systray.AddMenuItem("Quit", "Quit cleaning URLs")

    go func() {
        <-mQuitOrig.ClickedCh
        systray.Quit()
    }()
}

func onExit() {
    // Cleanup
    // This is pointless as of now, but might happen later.
}

It even comes with a tray icon! (Note that emojis won’t work on Windows, but it looks nice on macOS.)

Working example

The above text is the result of my own work on the problem told in the introduction. If you want to test it before you write some code on your own, grab it right from the repository:

% mkdir work
% fossil clone https://code.rosaelefanten.org/clipurlcleaner clip.fossil
% cd work ; fossil open ../clip.fossil

(A GitHub mirror is here.)

Build and run:

% go build
% ./clipurlcleaner >/dev/null &

I tested my solution on macOS Catalina and Windows 10 and it works sufficiently well. I cannot say if it works on Linux, BSD or Unix though - if it does not, please help me fix it.

Comments?

I sadly still cannot answer here on DEV because the staff hates me 😉, but feel free to ping me on Twitter or send patches or something. Thank you for reading!

Top comments (1)

Collapse
 
moopet profile image
Ben Sinclair

Very nice.

I was initially wondering why you'd need to fit your own URLs, because if you want to share something you'd likely have clicked on it and be sharing the resultant URL or another page on the target site rather than some shortened link you found somewhere. But getting rid of tracking parameters and doing it all against the clipboard automatically are Really Cool Things.