loading...
Cover image for The Dev.to Feed Algorithm

The Dev.to Feed Algorithm

piannaf profile image Justin Mancinelli 7 min read

TL;DR

I used to develop apps. I still do, but I used to, too

Back in 2007/2008, I learned Ruby on Rails and developed two prototype sites that didn't end up in production. Since then, I did extensive work on non-Ruby, non-Rails server applications and learned enough about Android and iOS apps to manage the development of mobile apps in my current role.

I never touched Ruby on Rails again...until @anshbansal asked a question that I had asked myself a few times before.

The following is my deep dive into the dev.to codebase to answer this question. There are probably a few things wrong, please point them out in the comments so I can correct them. Thank you.

Start at the beginning

And it doesn't get much earlier than the root route
root "stories#index"

Taking control

Rails follows a Model View Controller (MVC) architecture. When you ask dev.to to show you the root page, it will ask the stories controller to run the index action.

What we see there is it sets up a bunch of state then renders the articles/index template
render template: "articles/index"

Show me the stories

If you inspect your dev.to home screen, you'll notice all the articles/stories are listed within an articles-list div. You can find it in the articles/index view as expected.

And here's where we start to see how the feed is populated.

OK, first show me the featured story

The first story in the article list is a featured story.

The algorithm to get the featured story for a logged in user comes from the stories controller and the articles/index view. I've simplified it by substituting some variables and reorganizing some statements.

@stories = Article.published.limited_column_select.page(1).per(35)
@stories = @stories.
  where("score > ? OR featured = ?", 9, true).
  order("hotness_score DESC")
offset = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
          1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 
          5, 6, 7, 8, 9, 10, 11].sample # random offset, weighted more towards zero
@stories = @stories.offset(offset)

@featured_story = @stories.where.not(main_image: nil).first&.decorate || Article.new

In English:

  1. Fetch a collection of stories that score above 9 or are featured
  2. Order them, starting with the "hottest" one
  3. Randomly skip the first 0 to 11 stories, weighted more towards 0
  4. The featured story is the first story that has a main image

Leaving how score, featured, and hotness are determined as an exercise for the reader

Notice the featured article has nothing to do with which people, organizations, or tags you follow.

Now show me the rest of the stories?

After rendering the featured story, the article/index view creates a substories div and then renders the stories/main_stories_feed partial
<%= render "stories/main_stories_feed" %>

These are not the divs you are looking for

I was scratching my head while reading through the _main_stories_feed partial

It populates the data attributes of a new-articles-object div and a home-articles-object div, then a bunch of other divs that have no contents. And the divs I do see when inspecting the home screen have the single-article single-article-small-pic class, but don't look like what's in this file.

Evil action-at-a-distance like this can only mean one thing: JavaScript

Nobody expects the Spanish Inquisition

Searching the repo for new-articles-object and home-articles-object, we find them both in initializeFetchFollowed Articles, called very early when a page is initialized.

And there is a lot of logic here which I did not expect.

The new stories are not the old stories

The stories controller populated the @stories collection used for the for the featured story. It is also used to populate the the data attributes of the home-articles-object div. But that comes next, not now.

Instead, The first stories we see after the feature article are, populated from a query directly in the view.

@new_stories = Article.published.
  where("published_at > ? AND score > ?", rand(2..6).hours.ago, -15).
  limited_column_select.
  order("published_at DESC").
  limit(rand(15..80))

In English:

  1. Fetch a collection of stories that have been published some time in the last 2 to 6 hours and score above -15
  2. Order them by most recent first
  3. Return the first 15 to 80 of them

Then the JavaScript function insertNewArticles takes over:

articlesJSON.forEach(function(article){
      var articlePoints = 0
      var containsUserID = findOne([article.user_id], user.followed_user_ids || [])
      var containsOrganizationID = findOne([article.organization_id], user.followed_organization_ids || [])
      var intersectedTags = intersect_arrays(user.followed_tag_names, article.cached_tag_list_array)
      var followedPoints = 1
      var experienceDifference = Math.abs(article['experience_level_rating'] - user.experience_level || 5)
      var containsPreferredLanguage = findOne([article.language || 'en'], user.preferred_languages_array || ['en']);
      JSON.parse(user.followed_tags).map(function(tag) {
        if (intersectedTags.includes(tag.name)) {
          followedPoints = followedPoints + tag.points
        }
      })
      articlePoints = articlePoints + (followedPoints*2) + article.positive_reactions_count
      if (containsUserID || article.user_id === user.id) {
        articlePoints = articlePoints + 16
      }
      if (containsOrganizationID) {
        articlePoints = articlePoints + 16
      }
      if (containsPreferredLanguage) {
        articlePoints = articlePoints + 1
      } else {
        articlePoints = articlePoints - 10
      }
      var rand = Math.random();
      if (rand < 0.3) {
        articlePoints = articlePoints + 3
      } else if (rand < 0.6) {
        articlePoints = articlePoints + 6
      }
      articlePoints = articlePoints - (experienceDifference/2);
      article['points'] = articlePoints
    });
    var sortedArticles = articlesJSON.sort(function(a, b) {
      return b.points - a.points;
    });
    sortedArticles.forEach(function(article){
      var parent = insertPlace.parentNode;
      if ( article.points > 12 && !document.getElementById("article-link-"+article.id) ) {
        insertArticle(article,parent,insertPlace);
      }
    });

In English:

  1. Give each article 0 points to start off with
  2. Sum the weight of each tag (which can also be negative) the user follows and this article is tagged with, then double it
  3. Now add to that, the number of positive reactions the article currently has
  4. If the user follows the article's author, or is the articles author, add 16 points
  5. If the user follows the article's organization, add 16 points
  6. If the article is written in the user's language, add 1 point, otherwise, subtract 10 points
  7. Randomly (with equal chance) give the article an extra 0, 3, or 6 points.
  8. Subtract half the difference of this articles experience level vs the user's experience
  9. Order the articles by most points first
  10. If the article has more than 12 points, show it to the user

What about the rest?

The next batch of initialized articles come from the same batch we got the featured article from and processed by a new (but familiar) algorithm in insertTopArticles.

When you get to the bottom of that list, articles are populated from an algoliasearch index of ordered articles. The definition of that index is found in the Article model.

Finally, scrolling kicks in which you can find in initScrolling.js.erb and populates more articles from the algoliasearch index.

Leaving the details of these as an exercise for the reader

TL;DR

For the first article in the list:

  1. Fetch a collection of stories that score above 9 or are featured
  2. Order them, starting with the "hottest" one
  3. Randomly skip the first 0 to 11 stories, weighted more towards 0
  4. The featured story is the first story that has a main image

For the next batch of articles:

  1. Fetch a collection of stories that have been published some time in the last 2 to 6 hours and score above -15
  2. Order them by most recent first
  3. Return the first 15 to 80 of them
  4. Give each article 0 points to start off with
  5. Sum the weight of each tag (which can also be negative) the user follows and this article is tagged with, then double it
  6. Now add to that, the number of positive reactions the article currently has
  7. If the user follows the article's author, or is the articles author, add 16 points
  8. If the user follows the article's organization, add 16 points
  9. If the article is written in the user's language, add 1 point, otherwise, subtract 10 points
  10. Randomly (with equal chance) give the article an extra 0, 3, or 6 points.
  11. Subtract half the difference of this articles experience level vs the user's experience
  12. Order the articles by most points first
  13. If the article has more than 12 points, show it to the user

If you've scrolled passed all of those,

  1. Using the same collection the featured article came from
  2. Process with a similar but different algorithm as the previous batch

And, finally

All articles ordered by hotness

Closing remarks

This could change at any time. For example, on 2019-09-19, @ben merged a PR to add more variation to home feed. All links to github are to the commit that I saw which was in master at the time of writing but, by the time you read this, master has probably moved on.

Posted on by:

piannaf profile

Justin Mancinelli

@piannaf

Justin helps dev and product teams navigate the waters of mobile app development and is an expert at integrating them into larger technical, customer, and business ecosystems.

Discussion

markdown guide
 

This is really timely because we've just begun the phase of overhauling this. @nickytonline and @joshpuetz should check this out

 

Ha, really glad I added the disclaimer

by the time you read this, master has probably moved on.

Really happy you all keep iterating on every aspect, with community involvement , to keep improving

 

Reminds me of Jose Aguinaga's famous article about JS development.
First line: No new frameworks were made during the writing.
Top comment below: I highly doubt that.

 
 

Someone should do this about YouTube recommendations algo.

 

Someone should make an article about propretary software :p

 

Fetch a collection of stories that have been published some time in the last 2 to 6 hours and score above -15

I set up an RSS feed from my website and was using that to publish to DEV, but since I don't have timestamps set up properly on my RSS feed when it published to DEV it would immediately show as published "20 hours ago". Once I started going in and manually modifying the timestamps it seems like more people have been viewing my posts. I guess this is why!

 

Definitely. This reminds me of a PR I saw not too long ago

Ability to backdate a post #3455

Is your feature request related to a problem? Please describe. Unable to change publish date. I personally wish to back date a post, but there is no way to set the publish date (or time) for a post.

Describe the solution you'd like Add a custom variable for publish_date

Describe alternatives you've considered Time travel?

Additional context In lieu of being able to delete/edit comments on an old post I have duplicated and republished it as a new post, but the date does not/cannot reflect the origin publish date.

Semi related to #3274 and #1363

And @jess posed a great question

if a post is backdated, would we still surface it as new content?

 

Just wanted to add another thanks for this deep dive @piannaf as I've been referencing it over the past day. First up is getting all of these pieces in the same place: while technically someone could change the feed algorithm right now, one would need to change code in multiple places. That's part of what we're trying to improve!

 

Wow, thanks! That's an unintended side-effect I'm really glad has been beneficial.

 

This is awesome Justin! Thank you for the deep dive!

 

Thank you for taking the time to answer the question in such detail. I believe many users were expecting only to view the tags, users they are following chronologically perhaps like a RSS feed.

 

Yeah, when I first joined, that's what I expected "feed" to mean. Pretty quickly discovered that wasn't the case. But I've been happy with the recommendations because I like seeing things outside my chosen bubble from time to time.

Can understand, though people getting upset if they put -100 on a tag and still saw it anywhere

 
 

Are there tools for users to tune their feed , e.g. to show content with higher experience level?

 

"Content customization" allows you to change experience level

Content Customization
-- dev.to/settings/ux


Changing the weight on tags you follow alters the frequency they'll show up in your feed. They can be positive or negative (anti-follow)

tag following
-- dev.to/dashboard/following_tags


Following users and orgs increases the chance they will be in your feed

 
 
 
 

Great article(and explanation) I don't no Ruby, but when I do, I'd love to contribute to DEV.

 

Thanks for sharing this post. I have a better understanding of how posts are selected for display.

 
 

Really cool thank you for posting this

 

This is nice...
So what happens when an anonymous user logged in?

 

I added the offset trick to the home page of dicopedia.com, it works great! Thanks!