A/B Testing

Conversion & UX

Also: Split Testing · AB Test · Controlled Experiment

Lift = ((Variant B conversion rate - Control A conversion rate) ÷ Control A conversion rate) × 100
Lift formula((B minus A) divided by A) times 100
Most tests failStop early and you amplify noise
NeedsStatistical significance before calling a winner
RuntimeRun for full business cycles

Quick definition

A/B testing is a controlled experiment where you split traffic between two versions of something: the original (A, the control) and a changed version (B, the variant). You measure which produces more of a desired outcome and declare a winner only when the result clears a statistical significance threshold.

Run the numbers
%
%
Relative lift29.17%

Relative lift shows the size of the change, not whether the change is real. Always confirm statistical significance separately before shipping the variant. A large lift on small traffic is often noise.

How it varies across Australia

Most A/B tests run by Australian businesses are stopped too early and on insufficient traffic volumes, which means the majority of declared winners are noise. The tests that actually move commercial outcomes tend to come from teams with clear hypotheses and the discipline to wait out a full business cycle before calling results.

See conversion efficiency scores across Australian industries

What it actually means

An A/B test is a controlled comparison. You take the thing you want to improve, change one element, split your traffic evenly between the original and the changed version, and measure which produces more of the outcome you care about.

The key word is controlled. One change at a time. If you change the headline and the button colour and the hero image simultaneously, you can't know which change caused the lift. That's where multivariate testing begins, but that's a different conversation.

The maths behind A/B testing requires two things most teams skip. First, you need enough traffic in each variant for the result to be statistically meaningful. Second, you need to run the test long enough to capture the natural variation in your audience's behaviour across days of week, times of day and seasonal patterns.

Statistical significance is the threshold at which you can say the result is probably not a fluke. The industry default is 95% confidence, meaning you'd expect to see this result by random chance fewer than one time in twenty. That sounds solid. The problem is that most teams call the test the moment one line crosses the other on a chart, which is long before 95% confidence is reached.

A/B testing done badly produces a backlog of false winners, a team that trusts their instincts over their data, and a site that looks increasingly chaotic as changes compound on top of each other.

Most A/B test winners aren't winners. They're flukes that got called early.

How to calculate it

Lift = ((Variant B rate - Control A rate) ÷ Control A rate) × 100

Worked example. Control A converts at 2.4%. Variant B converts at 3.1%. Lift = ((3.1 - 2.4) ÷ 2.4) × 100 = 29.2% relative lift. Before shipping the variant, check that the test ran for at least two full business cycles and that your significance calculator shows 95% or higher confidence.

The Australian context

Australian traffic volumes are smaller than US equivalents for most businesses, which makes reaching statistical significance slower. A test that would hit 95% confidence in two weeks on a US site might take six weeks or longer on the equivalent Australian site. Teams that don't account for this either call tests too early or run tests on micro-segments where the result has no practical generalisability.

For Australian ecommerce and lead-gen sites with modest monthly traffic, the pragmatic path is often to run fewer, higher-stakes tests rather than a continuous testing programme. Testing the checkout versus testing a secondary CTA on a low-traffic page are very different bets on your experimentation budget.

Where people get this wrong

Calling the test as soon as the variant looks better.Early peeking inflates false positive rates sharply. The result at day three of a two-week test is almost always noise. Set your runtime before you start and don't check results until it ends.
Testing too many changes in one variant.If the variant wins, you don't know why. If it loses, you don't know what to fix. Change one thing per test or use multivariate testing with the appropriate traffic volume to support it.
Using relative lift to report to stakeholders without absolute numbers.A 50% lift sounds transformational. A lift from 0.2% to 0.3% conversion rate is a rounding error in most businesses. Always show both relative lift and absolute rates.

Related terms

Common questions

How long should an A/B test run?

At minimum two full business cycles, usually two weeks. Long enough to capture weekday versus weekend behaviour patterns and enough traffic for both variants to reach statistical significance. Set the runtime before the test starts and don't cut it short because the result looks clear.

What is statistical significance in A/B testing?

It's the confidence threshold that tells you the result is probably not a fluke. The standard is 95% confidence, meaning there is only a one-in-twenty chance the observed difference happened by random variation rather than because of the change you made. Below that threshold, the result is noise.

How is A/B testing different from multivariate testing?

A/B testing changes one thing and compares two versions. Multivariate testing changes several elements at once and tests combinations. A/B testing is the right starting point for most sites. Multivariate testing requires much higher traffic volumes to produce reliable results.

Can I run A/B tests on a small Australian site?

Yes, but the runtime will be longer and you should test higher-stakes changes rather than micro-optimisations. A test on a checkout flow with meaningful conversion impact is worth the wait. A test on the colour of a secondary button on a low-traffic page probably isn't.

Keep exploring

About New Rebellion

New Rebellion is a marketing intelligence consultancy. We build tools, score Australian businesses on how their marketing actually performs, and publish Debrief every day. This dictionary is part of how we work in the open.

How we think →