Elena

Posted on Oct 17, 2018 • Originally published at smartpuffin.com

When Not To A/B Test

#abtesting #experimentation #product #business

This article has been first published in my blog smartpuffin.com.

As I plowed my way through A/B testing of more than 300 features, I was wondering: are there any cases when you don't want to A/B test?

It seems like an easy way to understand if the users love your new feature. If you introduce it to half the users, are they buying your product more than the other half? If yes, show it to all users. If no, scrap it and build something else.

Are there any cases when you don't want that conversion increase, but you'd rather have a feature right away?

Having thought about it, I came up with this list.

1. Legal

Your Legal department wants you to immediately put an important message in your footer.

Perhaps, you cannot legally sell your products in some country.

Maybe, a lawsuit requires you to display the user agreement in full, or to change an icon, or to remove a copyrighted photo, or to display some important information to your customers before they buy.

You don't mess with Legal. You really have to add this feature.

2. Accessibility

You change colors so that color-blind people can use your website.

You increase font size for people who are visually impaired to use your app easier.

Add alt-text to images. Improve content for screen readers. Increase text contrast. Add subtitles to your video.

Yes, people with disabilities might actually convert better after all these improvements, but since they are a small fraction of all users, it won't be visible on your graphs.

And overall, this seems like a decent thing to do.

3. Moral choice

Your website displays a large banner about "great discounts", but it is out of date, and they are not applicable anymore.

Maybe your algorithm for calculating the "lowest price" is broken and displays this motivational message on all products. Users start to suspect something when all prices are the lowest.

Perhaps, you advertise "free shipping", but only in fine print the user can see it's domestic only. This is a nasty surprise at checkout.

Maybe you display ads and suddenly see something indecent. Or you feature other people's articles, and find out they posted something that strongly requires immediate moderation.

This is worth fixing, even if it doesn't bring your company more money.

4. Bugfix

Your layout is broken. A button isn't working. A link leads to a non-existing page. The "Add to cart" button doesn't add a product to the cart. (Well, this last one will be noticed immediately.)

You can fix this right away.

5. New product

Sometimes, people can be too careful in introducing new products or large new features.

You did your user research. You talked with your customers. You know the demand. But you're reluctant to make this large step. You think: maybe I can introduce this large feature as a series of small changes, and test them all for conversion increase.

But sometimes it is not enough. Sometimes a large feature needs to be introduced in full, so that the users could try the full experience.

It is especially challenging when you don't have the control group. If it's a new product, who do you compare your users (that is, the treatment group) to?

Absolute numbers might work better in this case: for example, what percentage of users are becoming customers. After establishing this number, A/B tests can be run to improve it.

6. Small number of users

If the number of users is not enough to have a decent sample size, what is the point of running an A/B test?

7. Professional users

With these users, you have to be extremely careful. Imagine you write software for call-center people, whose bonus depends on how many calls per day they are taking.

Do you really want to run random A/B tests on that software, knowing that someone's compensation depends on chance?

Users might become less productive if a feature they are using 3000 times per day changes its appearance every now and then. If you use something very often, it becomes automatic, and you become very sensitive to changes. Imagine someone experiments with the key combination to save a file or to exit vim. You'll become many times less productive because of that small change - and very, very annoyed.

Yes, I really have worked on software where a particular feature was used up to 3000 times per day.

For professional users, a better approach is more often to introduce a change after having spoken with them. Ask them about their needs. Collect their requirements. Agree with them on a change. Document it. Give them a beta to try. Collect feedback. Improve. Release.

Conclusion

You might still want to introduce all listed changes as experiments. The reason for that would be to see if you're creating any negative impact with the change. Perhaps, your increased contrast doesn't work on some devices or in some browsers? Maybe you broke your cart completely when fixing that "add to cart" button? Perhaps, you have accidentally forbidden selling the product in all countries? Then you will see it in your metrics. An A/B test will act as a feature switch. You'll be able to quickly turn your feature off, fix, and retry.

But for many cases I listed, it wouldn't be the best decision to wait weeks and months to get a large sample size, then compare conversion - and turn the feature off if the conversion growth is not enough.

Some features deserve to live regardless of conversion increase they bring.

Top comments (4)

Ben Halpern • Oct 17 '18

Nice post Elena, I'd add this one:

Feature success is too hard to actually measure.

I see a lot of time wasted on trying to bend over backwards to measure something when you really just need to trust your gut, or trust the expertise involved. You're only testing because it will deliver good return on investment in the first place. If your investment to try and measure the thing is too great, you might want to take a different approach.

Depending on your implementation, split testing can also have pretty terrible consequences in terms of performance and software complexity. In general I think we need to proceed with humility in this area.

Elena • Oct 18 '18

Hi Ben, thank you, you raise a great point.
Actually, even two points: hard to measure and complexity. I should've thought about it myself, I've seen consequences of both!
Thank you again!

yonootz321 • Oct 18 '18 • Edited

I don't get it. Why shouldn't I use a/b testing when releasing a bugfix for example? The success metric in that case would be the error rate, which I want to drive down. A/B testing doesn't have to be coupled to conversion. You can use any success metric

Rafal Pienkowski • Oct 18 '18

Thanks a lot for the article and for listening to the Vox populi.