This requires a recommendation algorithm to distill the roughly 500 million Tweets posted daily down to a handful of top Tweets that ultimately show up on your device’s For You timeline.
The pipeline above runs approximately 5 billion times per day and completes in under 1.5 seconds on average. A single pipeline execution requires 220 seconds of CPU time, nearly 150x the latency you perceive on the app.
Along side with OpenAI, I personally think this is one of an important moment in the computing community as no one would ever guess a global-scale algorithm such as Twitter's Recommendation becomes open-sourced. Based on their engineer blog post, it is not out of reach to say the code base literally costs hundred thousands if not millions a day to run. How do you feel about this moment?
Twitter's engineer blog
Twitter Recommendation Algorithm
The Twitter Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the Home Timeline. For an introduction to how the algorithm works, please refer to our engineering blog. The diagram below illustrates how major services and jobs interconnect.
These are the main components of the Recommendation Algorithm included in this repository:
|Feature||SimClusters||Community detection and sparse embeddings into those communities.|
|TwHIN||Dense knowledge graph embeddings for Users and Tweets.|
|trust-and-safety-models||Models for detecting NSFW or abusive content.|
|real-graph||Model to predict likelihood of a Twitter User interacting with another User.|
|tweepcred||Page-Rank algorithm for calculating Twitter User reputation.|
|recos-injector||Streaming event processor for building input streams for GraphJet based services.|
|graph-feature-service||Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B).|