DEV Community

Cover image for Twitter's Open-Source Recommendation Algorithm
Robin Lee
Robin Lee

Posted on

Twitter's Open-Source Recommendation Algorithm

Twitter's Recommendation Algorithm

Twitter aims to deliver you the best of what’s happening in the world right now. This blog is an introduction to how the algorithm selects Tweets for your timeline.

favicon blog.twitter.com

Less than seven months ago, Elon Musk paid 44 billion dollars for Twitter. Ever since, he fired half the company and gave blue check marks to everyone. Twitter is now only worth 20 billion dollars. Many users have moved to Mastodon and the NYT lost its blue check. It looks like Twitter is collapsing. However, in reality, Elon is playing the long game of chess against the mainstream news media like the Fox News and CNN channels. He is trying to take their advertisers by making Twitter the future platform for all journalism.

Twitter made a part of its recommendation algorithm open-source about a month ago. Although it is real production code at Twitter, it is not 100 percent of the code, so it is really only useful for research and transparency. The code base is mostly written in Scala, a JVM language that is similar to JAVA but concise. Twitter was originally written with Ruby on Rails but they moved away from it over a decade ago.

If you take a closer look into some of the files in the repo, we can notice some extremely interesting implementations and details. Take a look at these code snippets for example (getLinearRankingParams from EarlybirdTensorflowBasedSimilarityEngine.scala file is now deprecated as of Apr 05, 2023).

We have a bunch of ranking parameters each with a default value. Retweets provide a 20 times boost while likes provide a 30 times boost. Images and videos also provide a small boost. Not surprisingly, you also get a boost for being a paying Twitter blue member.

On the other hand, a tweet can also get a negative boost if the account has a lot of mutes, blocks, or spam reports. Spelling errors and made up words will also give you a debuff.

Offensive, spamming, and NSFW tweets can also get a debuff while trending, verified, and media tweets get a boost. There is also a long list of topics that won't be amplified: anything that has been flagged as misinformation, harassment, etc.


How does Twitter actually select the tweets to display on our home page using these parameters, then? We can break the recommendation pipeline into three parts.


How the twitter recommendation pipeline works

The first step is to find a pool of 1500 tweets that you might be interested in using a technique called candidate sourcing. There are three ways Twitter uses for candidate sourcing. First pool of candidates that consist a majority of your home page is using your followers, or your in-network source. For this, Twitter uses a model called Realgraph which predicts the likelihood of engagement between two users. Second pool of candidates come from accounts you don’t follow yet, or your out-of-network source, using two concepts: social graphs and embedding spaces. To select relevant tweets from your graph, Twitter uses an algorithm called GraphJet, a graph processing engine that maintains a real-time interaction graph between users and Tweets that traverses through your social graph. For most of your out-of-network tweets, however, Twitter uses an algorithm called SimClusters to discover communities anchored by a cluster of influential users in an embedding space.


Communities in an embedding space grouped by the SimClusters algorithm

From there, it ranks that pool of tweets with a 48 million parameter neural network. Lastly, it filters out contents by static rules like accounts that you've blocked or muted.


Why would Elon do this? Why would he release his trade secrets to the public? Well, it kind of makes Twitter like the Linux of social media. The public can identify parts that are unfair in the algorithm and address them in public.

In my opinion, it is mostly a marketing move to build trust. It no longer feels like Twitter is run by a mysterious figure and de-boost content without some degree of transparency. There is also a huge opportunity here because the trust in the mainstream media has fallen so low many people already use Twitter to consume the news. And although Twitter is currently losing money, they have talked about compensating content creators just like Youtube and other platforms, too. When that happens, journalists could potentially make a living on Twitter and put their best content there.

Elon knows Twitter blue is never going to make Twitter any money but rather it is designed to uplift independent creators while embarrassing the establishment. The blue checks are now irrelevant and by open sourcing the code, Twitter is laying the groundwork to become the fair and balanced most trusted name in the news. This may force other social media platforms to become more transparent.

Some parts transcribed from Fireship's video: Twitter algorithm open-sourced...

Top comments (0)