A/B testing explained

Contributors

Written by Sean Dougherty
Senior Brand Creative at Funnel, Sean has more than 15 years of experience working in branding and advertising (both agency and client side). He's also a professional voice actor.

Whether you love it or find it suspect, A/B testing is a concept that every marketer should be familiar with. So, what is it exactly?

Well, A/B testing refers to a specific form of experimentation that marketers often use to determine the effectiveness of various efforts. And while some work with it every day, others wonder what the difference between A and B are.

Join us as we explain what A/B testing is, walk through some real-world examples and explore the limitations of the methodology.

What is A/B testing?

At its core, A/B testing determines which of a set of variables has the most significant impact on a given metric. In marketing, A/B testing can be used to identify which call-to-action on a web page generates more conversions or which copy or image in an ad is most effective.

A real-world example

To give this definition further context, we can use an example from our own homepage. We wanted to test which media type drove more conversions: an image of a happy customer or a video explaining our product.

What is ab testing Linked Comp 02_00499 The image of the happy customer as option A, and the video as option B.

Specifically, we wanted to see if either option had a greater influence on a visitor clicking the “Book a Demo” button than the other. To do so, we decided that we would show 50% of visitors option A and 50% of visitors option B. The choice was made at random.

Note: You don’t need to split your test evenly. Depending on the variables, length of the testing period and other factors, you may decide to weight the test differently.

Why is A/B testing important for marketers?

A/B testing offers marketers a simple and straightforward way to determine the effectiveness of certain design and messaging choices by gathering real-world data. Without testing, well, you’re just guessing.

Let’s revisit our homepage example. Let’s say this whole project started off with a meeting about how to drive more “Book a Demo” conversions from the homepage. One person may raise their hand and say we need to put a compelling video at the top. Another person may then say that it’s best practice to show a happy customer.

How do you decide which option to go with? Both people are steadfast in their opinion. Do you go with the democratic route and just vote on which to go with? The elected option may be less effective, and that choice may even cause conversions to go down.

Additionally, if you do happen to notice an uptick in conversions, you won’t necessarily be able to claim that the higher performance is due to the selection you made. Rather, it could be due to a spike in ad performance, a celebrity mentioning your brand, seasonality or a host of other reasons.

Instead, by testing the options with A/B methodology, you can gain a clearer picture of which choice is right.

What’s needed for A/B testing?

So, you understand the importance of A/B testing at this point and you now want to start testing for yourself. Let’s quickly run through the core components of a good test.

Control group

A/B testing isn’t just about comparing two new variables. You need some sort of baseline (or norm) to test against. This is called your control group.

You might remember this concept from science class or the pharmaceutical industry. It measures what happens when no change is applied at all. It’s this control group that helps you account for those unrelated spikes in conversions like seasonality, etc.

In the case of our homepage, the control group would see the current version of the homepage without option A or B.

Hypothesis

Any good experiment is trying to prove or disprove something. Otherwise, you are just doing stuff for no real reason at all.

This means that you should have some sort of claim in your mind ahead of time. In our example, one person claims that a customer image will drive more conversions, and another person claims a video will drive more conversions.

Those hypotheses give us the variable to test (image or video) as well the means by which to measure them (homepage conversions).

Additionally, the colleague favoring the video may have an even better hypothesis: using a video on the homepage will lead to a 30% increase in conversions. This kind of detail in your hypothesis is great for later on when you need to think about test length and volume required so you can see a statistically significant signal in the data.

A good format to use for your hypotheses goes: If_____, Then_____, Because______. For example, if I add a video to our homepage, our conversions will increase by 30%, because the video will effectively demonstrate the value of our product.

Metrics

You know we love a good metric, and they are critical to setting up your test effectively. In our example, we will want to primarily view conversions. However, another test may require you to examine engagement, time on page, click-throughs, acquisition cost and more.

Your choice of metrics should be driven by your detailed hypothesis. Additionally, you need to be sure that you can accurately measure that metric. For instance, trying to A/B test something like brand awareness between two options of an ad creative could be very difficult to accurately measure — and it could take a very long time to do so.

Great A/B testing tools: Survey Monkey and ABTasty

More A/B testing examples

Let’s put our homepage test aside for a moment and look at other ways marketers can employ A/B testing.

Email optimization

First, think of your email marketing program. If you're an online retailer, you may have a customer retention or loyalty program that reaches out to shoppers who’ve made a purchase from your store within the last six months. All of those emails will need a subject line, which provides any opportunity to test different options.

Again, though, you’ll need a hypothesis to test. Are you trying to test the open rate? If so, how much of an increase do you envision? And why are you focusing on the open rate at all?

Then, once a customer opens your email, there are a whole host of elements that can be A/B tested. If you’re a fashion brand, you may want to test highlighting new seasonal releases versus related products based on the customers’ past purchases.

But beware: the plethora of options that can be tested requires a strong framework and even stronger data governance.

Improving app revenue

Second, let’s imagine you have an app through which users can be monetized. You may want to test different interfaces, colors, button styles and more against the rate of purchase conversion.

It’s important to note that A/B testing calls for running separate tests for each of these variables, with each test having a control group. If running each of those tests may take too long, you may want to think about multi-variate testing — though that’s a subject for another day.

What’s the deal with A/B testing?

While we’ve spent a significant portion of this piece extolling the virtues of A/B testing, that’s not to say it’s without limitations. Welcome to our hot takes on this methodology.

Statistical significance

In order to gain usable data from any A/B test, you need to hit statistical significance. In other words, there can be a lot of random fluctuations in your data (called noise). Any “results” from your testing need to noticeably stick out from this noise. The level at which it needs to stick out is mathematically calculated based on several factors like test volume, hypothesis clarity and more.

That can be a very hard thing to achieve for certain companies trying to run an A/B test. For instance, if we had a high degree of conversion irregularity (say, +/- 10% day to day), we would need to measure a large increase in conversion rate over that control group, perhaps by 30% or more, to make any determinations from the resulting test data.

If your results fall within the range of day-to-day fluctuations, you probably can’t make a determination from the data. Let’s explain using the 10% conversion irregularity.

Imagine we tested the video on our homepage. One day, we saw a 5% uptick in conversions. The next day, we saw a 5% decrease. And the following day, we saw a 12% increase in conversions.

The first two days can be easily understood as falling within the “noise” of everyday operations. However, the third day could perhaps show a signal to us, though it’s not that really far off from our 10% daily fluctuations, and certainly not enough to draw any insights.

However, imagine that, over the course of a year and with a representative amount of web traffic, that 12% increase in conversions became the daily average. We may be able to begin making inferences, though it may also signal that we need to test more.

That’s all to say that this sort of testing requires volume and time, meaning websites with low visitor volume will likely struggle to run these sorts of tests.

And for those whose websites have insufficient visitor volume? Find other hypotheses that can be tested that will have the largest impact on your business.

A/B tests can slow you down

To be fair, taking the time to run any experiment can be a slower alternative to just following your gut instinct right away. For some companies and situations, though, A/B tests can be particularly cumbersome and may not result in any worthwhile insights. Let’s explain what we mean.

Imagine you’re working on a brand identity refresh for your company. You’ve spent a lot of time, energy and money with your team and an external agency. You’ve gone through several concepts and iterations, and you are now down to your final two options.

Perhaps you’d like to perform an A/B test with a sampling of current and/or prospective customers to see which identity refresh really moves the needle.

Except… you’ve just run into a few fundamental problems. First, brand identities are highly subjective from person to person. After all, some people just hate blue.

Second, and perhaps more importantly, you’re lacking a clear hypothesis or rational metric. In theory, you may want to know that brand refresh A may lead to higher revenues than refresh B (and especially more than keeping your brand as it is). Except brands don’t really work like that - nor do you have a specific revenue increase goal in mind.

Plus, there’s no easy way to launch both identities, and the control group, to the same audience subset as we did in our homepage example. That would cause incredible market confusion.

Now, I know what you’re about to suggest: focus groups! Again, see above regarding subjective perspective. And without objective data (rather than opinions) you’re sort of throwing your money and time away.

Additionally, A/B testing can quickly seem like the best way to determine every single element of your website’s UI and UX. Yet, implementing all of those tests to all of those elements would take so long that your website would be out of date by the time you launch anything. In many instances, you need to rely on the expertise of your team to launch projects in a timely manner.

There is another… test

Let’s say you have a solid hypothesis that can be tested through quantifiable means. That still doesn’t mean A/B testing is the right course of action. You may want to test multiple variables all at the same time. In that case, you’ll want to explore multi-variate testing.

Or, perhaps you want your testing to inform and improve a specific product. In that case, it may make more sense to layer your tests and continually iterate – learning as you go.

What we’re trying to say is that, while A/B testing can feel like it provides you with a black-and-white answer to the marketing world’s most pressing questions, there is always an alternative route you can follow. Oftentimes, these alternatives offer a better solution for your specific needs.

That’s not to say that A/B testing isn’t valuable (it is!), it’s just that it’s not a panacea that many marketers make it out to be.

What do you think?

So, are A/B tests manna from heaven for marketers, are they a waste of time or are they somewhere in between? We fall into the third category. They can provide rich insights (when executed properly), but we also don’t want to be led around by those by test data. Sometimes, human instinct can figure out innovative solutions that lead to even happier customers.

Let us know what you think. Drop us a line on LinkedIn, or get into the comment section on our YouTube channel.

What is a multivariate test?

A multivariate test (MVT) is an experimental technique used in conversion rate optimization (CRO) and marketing to evaluate multiple variables simultaneously to determine the most effective combination. It extends the concept of A/B testing, which typically compares two versions of a single element, by testing multiple combinations of several elements at the same time.

How long should an A/B test run?

An A/B test should run long enough to reach statistical significance, which typically means collecting data for at least one full business cycle (e.g., one week) to account for daily variations, and until a sample size is reached that provides confidence in the results. Using an online calculator can help estimate the necessary duration based on your traffic and expected effect size.

Contributors

Written by Sean Dougherty
Senior Brand Creative at Funnel, Sean has more than 15 years of experience working in branding and advertising (both agency and client side). He's also a professional voice actor.