Split Testing

Analytics

Also: A/B Testing · AB Testing · Bucket Testing · Controlled Experiment

Minimum sample size = (Z-score squared × p × (1-p)) ÷ margin of error squared

What it doesCompares two versions with real users

GoalIsolate which change caused the result

Most common mistakeStopping too early

RequiresEnough traffic to be reliable

Quick definition

Split testing is a method of comparing two or more versions of a webpage, email, ad or other marketing element by showing each version to a different group of real users. The version that performs better on a defined metric wins. It is also called A/B testing when exactly two versions are compared.

Run the numbers

Baseline conversion rate (%)

Minimum detectable lift (percentage points)

Minimum visitors per variant4,472

This is the sample needed per variant at 95% confidence. Double it for your total test traffic. If your page does not hit this volume within four weeks, the test will not produce a reliable result.

How it varies across Australia

Most Australian businesses running split tests do not have enough monthly traffic to reach statistical significance within a reasonable timeframe. Tests on low-traffic pages often run for months before reaching a reliable conclusion, which is why testing strategy matters as much as testing tools.

See conversion efficiency scores across Australian industries →

The types of split test

A/B test(A/B)

One variable changed between two versions. The cleanest test design.

Two variants only

Multivariate test(MVT)

Multiple variables changed simultaneously to find the best combination.

Requires much higher traffic

Split URL test

Different URLs serve each variant. Used when page structure changes significantly.

Different page paths

Bandit test

Traffic automatically shifts toward the winning variant during the test.

Reduces losses during test

What it actually means

Split testing is how you remove opinion from design decisions. Instead of arguing about whether the green button or the blue button converts better, you show each to a separate group of real users and measure what actually happens. The data decides.

The term is often used interchangeably with A/B testing. Strictly, a split test is the broader category. An A/B test is the most common form: one variable, two versions, one winner. A multivariate test (MVT) changes multiple variables at once and needs far more traffic to produce reliable results. Most teams calling themselves 'multivariate testers' are running underpowered tests and drawing false conclusions.

The hardest part of split testing is not the technology. It is the discipline of not looking at results until the test has reached statistical significance. Tests stopped early almost always show a false winner. The variant that happened to be winning on day three is not reliably the variant that would win at full sample.

Statistical significance is the threshold at which the difference between variants is unlikely to be random noise. The standard threshold is 95% confidence, meaning you would see a result this large by chance fewer than five times in a hundred. Below that threshold, you have not learned anything reliable.

A split test that ends when you see what you wanted to see is not a test. It's a confirmation.

How to calculate it

Minimum sample per variant = (Z² × p × (1 - p)) ÷ MDE² where Z = 1.96 for 95% confidence, p = baseline conversion rate, MDE = minimum detectable effect

Worked example. Your checkout page converts at 3% (p = 0.03). You want to detect a lift of 0.5 percentage points (MDE = 0.005). Using Z = 1.96 for 95% confidence: sample = (1.96² × 0.03 × 0.97) ÷ 0.005² = (3.84 × 0.0291) ÷ 0.000025 = 0.1118 ÷ 0.000025 = 4,473 visitors per variant. You need roughly 4,500 sessions on each variant before the result is reliable.

The Australian context

Australian ecommerce and lead-generation sites often run on thinner traffic than equivalent North American comparisons, which means fewer pages qualify for split testing within a reasonable window. The practical response is to consolidate testing onto the highest-traffic pages and accept that most pages on an Australian site will never generate enough volume to test reliably. Testing tools charge by traffic volume, not by test quality. Many Australian businesses are paying for testing infrastructure they cannot realistically use.

Where people get this wrong

Stopping the test as soon as one variant pulls ahead.Early leaders in split tests frequently reverse as more data accumulates. Stopping at the first signal of a winner is the most common source of false positives in any testing programme.

Running multivariate tests on low-traffic pages.A multivariate test with four variables and three options each requires exponentially more traffic than an A/B test. On a page with a few hundred monthly sessions, the test will never reach significance.

Testing the wrong thing first.Small copy tweaks on a low-converting page waste testing cycles. The highest-leverage tests are structural: page layout, offer framing, form length. Start with the thing most likely to move the metric that matters.

Common questions

How long should a split test run?

Until it reaches the minimum sample size per variant, and for at least one full business cycle (usually two weeks minimum to account for day-of-week variation). Do not end a test early because it looks like it is winning or losing.

What is the difference between split testing and multivariate testing?

A split test (A/B test) changes one variable between two versions. A multivariate test changes multiple variables simultaneously to find the best combination. Multivariate testing requires substantially more traffic to produce reliable results.

What conversion rate lift is worth testing for?

Only lifts that would meaningfully change a business decision. Testing for a 0.1 percentage point improvement requires so much traffic that most pages will never produce a reliable result. Focus on changes where a real uplift would shift your acquisition economics.

Does split testing require a dedicated tool?

Not always. Google Optimize is no longer available, but alternatives like VWO, Optimizely and Convert handle the variant serving and significance calculations. For email, most platforms have built-in A/B testing. The tool matters less than the discipline around sample sizes and runtime.

Debrief

Get the next one

No spam. No fluff. Just the next article, straight to your inbox.

Keep exploring

About New Rebellion

New Rebellion is a marketing intelligence consultancy. We build tools, score Australian businesses on how their marketing actually performs, and publish Debrief every day. This dictionary is part of how we work in the open.

How we think →