DEV Community

Megan Risdal for Stack Overflow

Posted on

Experiment results: Testing close/reopen thresholds on Stack Overflow

At Stack Overflow, one of the main things we're working on is improving the ways our Q&A system facilitates feedback among users. Our user base plays an important role in not only answering questions asked by thousands of people per day, but also in helping to ensure that Stack Overflow is valuable as a resource which helps our future selves, too. This means that in many ways there are real people behind the keyboard using our software interacting with each other and influencing experiences people have (good and bad). Stack Overflow users deserve a system that sets everyone on all sides up for success.

XKCD, writing code comments to your future self

Today, an essential part of how feedback is relayed among users on our site is our system for closing (and reopening) questions. While it was originally designed to help facilitate the creation and preservation of high quality question and answer artifacts, it's also a source of a lot of friction for our users. You ask a question on Stack Overflow because you think you can get an answer there. Getting your question closed today can be confusing, frustrating, and sometimes even unjustified.

So, what are we doing about it? We're taking a three-pronged approach:

  1. Small changes to make the current experience feel less opaque or frustrating (e.g., redesigning our post notices including those for closed questions to be more helpful)
  2. Experiments to understand the effects and emergent behavior of the system when we make changes
  3. In the longer term, performing an audit of the current system and user research to inform a holistic overhaul

In the rest of this post, I'll describe a recent observational experiment we ran as an example of approach #2 including what we observed and our thoughts on next steps (which I'd love your feedback on!). Shog, a Community Manager at Stack Overflow, and I published a longform version of this write-up on meta. This is mostly Shog's write-up; I contributed the charts.

Close/reopen vote threshold experiment

In August, we ran an observational experiment on Stack Overflow in which we reduced the number of close/reopen votes required to close/reopen a question. Our main hypothesis was that this would make the system more efficient. We operationalized "efficiency" to mean, if a first close vote is cast, that question is more likely to ultimately get closed (same for reopening).

Why do we care about efficiency? For users whose questions are closed in the current system as it works today, we'd prefer that their questions get closed sooner before attracting downvotes/discouraging comments. And for users who volunteer to review questions ("curators"), if their vote does not result in closing or reopening, it should be because other members of the site reviewed it and decided that outcome was unwarranted; the vote should not be ignored.

As with all experiments (observational or experimental) and product changes we make on Stack Overflow, we also monitor other metrics. Some of the other factors we looked at:

  • Qualitative perceptions shared on meta (where we announced that we were doing the experiment) and in our site satisfaction survey
  • Volume of things like closed/reopened questions and participation in the respective review queues
  • "Close wars" (a close war is when a question is closed, reopened, then closed again)
  • Consensus (there are multiple reasons a close voter can choose from and reducing votes from 5 to 3 could impact consensus rates)

We ran the experiment for thirty days and compared the experimental period to pre- and post-periods.

The experiment results

Efficiency

Our hypothesis was confirmed. The close/reopen system was more efficient across the board.

Table of close/reopen/edit efficiency results

Very importantly, the total quantity of questions nominated for closure stayed about the same, even as efficacy -- questions actually getting closed -- went up. This means that the experiment didn't trigger a wave of close voting.

Qualitative perceptions

We heard the most from our active, engaged users on meta who represent the curator side of this equation. From them we heard optimism and motivation to participate in the site plus a lot of thoughts on how the experimental threshold would impact the close system including users implicated in it.

We supplemented this feedback by looking at responses to our site satisfaction tracking survey which we recently started as a way to get more diverse feedback from Stack Overflow users. On the survey, we regularly see people mentioning closing as one of the most frustrating things about using Stack Overflow. During the experiment, we saw a slight uptick, but it wasn't anything significant. This is something we'll continue to monitor.

If you have thoughts, I'd love to hear them in the comments here!

Closing/reopening and participation

Right now, a very small group of people participate in closing/reopening and reviewing questions that have been voted to be closed/reopened. As long as we have the same system in place, we would prefer greater diversity among the people participating.

Number of active reviewers

During the experiment, we saw more participation in reviewing questions. This was driven by an increase in reopen reviewers.

Questions closed over time

During the experiment, we saw an increase in the number of questions closed. Recall this was NOT the result of an increase in the number of questions voted to be closed, but rather an improvement in the efficiency of the system.

Questions reopened over time

During the experiment, we saw an even larger relative increase in the total number of questions reopened.

Close wars

A "close war" is when a question is closed, reopened, then closed again. Did lowering the threshold required to close/reopen a question result in an increase in their occurrence?

  • In the 30 days prior to the experiment, 100 questions were closed at least twice.
  • During the 30-day experiment period, 188 questions were closed at least twice.

We did see an increase, but close wars are actually already relatively rare on Stack Overflow. We aren't too worried about this.

Consensus

During the experiment, it was much easier to close a question without consensus among close reasons.

  • In the 30 days prior to the experiment, 49 questions were closed without a consensus reason.
  • During the 30-day experiment period, 560 questions were closed without a consensus reason.

This is more concerning to us. It's an emergent behavior of the lower close threshold which we're not as happy to see.

Conclusion and call for feedback

From this experiment, we've concluded that a lower close/reopen vote threshold successfully increases the efficiency of the system. This has positive effects on our curators who volunteer to review questions and we hypothesize that it's a better experience for users who ask questions (though we'd love to hear more feedback from you!). It was especially encouraging that there was NOT an increase in the number of questions that were voted to be closed even while participation in the process increased.

Based on these results, we have learned that we would like to implement a consensus rule before lowering the close/reopen threshold to 3. That is, a question must receive n close votes have to agree before a question gets closed.

Our thinking is that a consensus provides more confidence that question should be indeed closed AND gives the question author more concrete feedback which they can learn from or act on.

What are your thoughts? What feedback do you want when you ask a question on Stack Overflow and how do you want to receive it? Do you think we should enforce consensus close reasons? What aspects about asking a question on Stack Overflow today frustrate you?

Top comments (0)