Forem Feed Experiment One: January Results

#meta #product #ux #changelog

Background

In December, Amy wrote about running an experiment on our feed. And it’s time to revisit that experiment and make a decision.

The Goals

In our previous feed experiments, we established six goals to track for our feed experiments:

User creates a comment.
User creates comments on at least 4 different days within a week.
User views pages on at least 4 different days withint a week.
User views pages on at least 4 different hours within a day.
User views pages on at least 9 different days within 2 weeks.
User views pages on at least 12 different hours within five days.

For this current experiment, which we’re wrapping up, we re-used those goals.

Here’s a link to the code that captures “conversions” for each of the goals.

The Methodolgy

We use the field_test gem to facilitate the Bayesian A/B hypothesis testing. As part of the experiment, I added an AbExperiment model to Forem. This provides numerous mechanisms to test and toggle experiments. Which proved fortuitous when I broke production.

We then introduced the code to select which Feed algorithm to use. And aside from the minor outages I introduced (and we corrected), we sat back and let the experiment run.

Results

Below are the summary of results regarding the experiments:

Scenario	Incumbent Conversion	Challenger Conversion	Likely Winner	Probability of Winner
Creates a comment.	5.58%	5.87%	Challenger	90%
Creates comments on at least 4 different days within a week.	0.23%	0.19%	Incumbent	78%
Views pages on at least 4 different days withint a week.	23.98%	23.52%	Incumbent	86%
Views pages on at least 4 different hours within a day.	14.17%	13.62%	Incumbent	94%
Views pages on at least 9 different days within 2 weeks.	9.60%	9.41%	Incumbent	73%
Views pages on at least 12 different hours within five days.	2.24%	2.13%	Incumbent	73%

Conjecture

First, and foremost, it appears that both feed strategies encourage close to the same engagement. Which is reassuring that the experiment likely did not adversely affect the DEV.to experience.

Second, I’m prepared to call this first experiment in favor of the incumbent.

Third, it appears that the challenger encourage initial conversations, but those conversations dwindled overtime.

Why do I think that this is the behavior? My hypothesis is two primary changes for the challenger:

The daily_decay_factor, the numeric multiplier we assign to the publication date, overly favored more recently published articles.
Sorting the relevant feed entries by publication date, instead of the relevance score.

Let’s look at the change in publication date decay rate.

Days Since Published	Challenger #1 Weight	Challenger #2 Weight
0	1	1
1	0.95	0.99
2	0.9	0.985
3	0.85	0.98
4	0.8	0.975
5	0.75	0.97
6	0.7	0.965
7	0.65	0.960
8	0.6	0.955
9	0.55	0.95
10	0.5	0.945
11	0.4	0.94
12	0.3	0.935
13	0.2	0.93
14	0.1	0.925
15 or more	0.001	0.9

For the original challenger, I chose a more aggressive decay rate. For the second challenger, I’m significantly easing off of the decay.

I’m also removing the order by publication date, so the upcoming feed experiment will now sort things in relevance order.

Next Steps

I’ve begun the proposal for our next feed experiment. This introduces a few minor tweaks and is intended to be a point for a conversation around how to configure the challenger’s case statements.

Top comments (5)

Ben Halpern • Jan 3 '22

As a meta comment for all of this, the priority will be to hasten the iteration process to ensure we continue to iterate on getting the most relevant feed possible to make for the most awesome communities on DEV and elsewhere.

We'll continue to tune and build on what we've been working on kicked off by @amyatforem, @jeremyf and co from our team while ensuring we don't get sucked into the wrong local maximas and generally doing all of this with as much transparency as possible so that folks can weigh in and help build such a critical feature with us. 😄

GrahamTheDev • Jan 3 '22

An interesting experiment.

I can guess why the engagement dropped though, especially for multiple times a day.

By removing the "randomness" on the feed when things were returned it meant I kept seeing the same articles again and again in the same order if I checked multiple times a day and so it became "boring" seeing the same articles and I ended up checking the latest tab or the top tab. Previously I might see similar articles but the order changes so it brought items to the top of the feed and into prime position that I might have skipped past previously.

It would have been a more even (fair) test if you also randomised the results returned on the test to more closely match how the previous feed worked.

I am personally glad this particular test is over, but I am a heavy user (who doesn't follow any tags) so I might notice the feed seeming a lot "slower" than it was previously.

I look forward to the next test as at a glance it looks like that would be a much more balanced feed that surfaces posts with a lot of activity 👍

Happy new year and keep up the great work ❤