Marketinga/b testsample sizestatistical significance

A/B Test Sample Size Calculator

Enter your baseline conversion rate, the minimum improvement you want to detect, and your desired confidence level. The calculator tells you exactly how many visitors each variation needs before you can trust the results.

Advertisement

Calculator

%
%

See your A/B Test Sample Size Calculator results

Enter your email to unlock results — free forever.

or

No spam, ever. Unsubscribe at any time.

Advertisement

Formula

n = (z_α/2 + z_β)² × (p₁(1−p₁) + p₂(1−p₂)) / (p₂ − p₁)²

n is the required sample size per variation. z_α/2 is the critical value for your significance level (1.96 for 95%). z_β is the critical value for statistical power (0.84 for 80%). p₁ is the baseline conversion rate and p₂ is the expected rate after the MDE lift.

How to use the A/B Test Sample Size Calculator

  1. 1

    Enter your baseline conversion rate

    Your current conversion rate before the test.

  2. 2

    Enter your minimum detectable effect

    The smallest relative improvement worth detecting (e.g. 20% means detecting a lift from 3.2% to 3.84%).

  3. 3

    Enter your statistical significance

  4. 4

    Enter your statistical power

  5. 5

    Read your results instantly

    Results update in real time as you type.

Advertisement

Why sample size matters in A/B testing

Running an A/B test without calculating sample size first is one of the most common — and costly — mistakes in conversion optimization. Stop a test too early and you'll declare a winner based on random noise. Run it too long and you waste traffic on a losing variant.

The sample size formula answers: given how small an effect I care about, how much data do I need before I can be confident the result is real? The smaller the effect you're trying to detect, the more data you need. Trying to detect a 5% lift requires roughly 16× more data than detecting a 20% lift.

Minimum Detectable Effect (MDE): how to choose it

The MDE is the smallest improvement that would be worth acting on. It's a business decision, not a statistical one. If a 5% lift in conversion rate would generate $500/month in extra revenue, and implementing the change costs $10,000 in engineering time, a 5% MDE doesn't justify the test — your breakeven is years away.

Most teams use 10–20% relative MDE as a starting point. Lower MDEs (5%) are appropriate for high-traffic, mature products where even tiny gains compound significantly. Higher MDEs (30–50%) make sense for early-stage products where you're looking for large directional signals, not fine-tuned optimization.

Advertisement

Statistical significance vs. statistical power

These are two separate error rates that are often confused. Statistical significance (α) controls your false positive rate — the probability of declaring a winner when there's actually no difference. At 95% significance, you accept a 5% chance of a false positive.

Statistical power (1 − β) controls your true positive rate — the probability of detecting a real effect when one exists. At 80% power, you'll correctly detect a real improvement 80% of the time, and miss it 20% of the time. Most teams use 95% significance + 80% power as a baseline. Bumping power to 90% increases sample size by roughly 25–30% but cuts your miss rate in half.

Tips & Insights

Never peek at results early

Checking significance before you hit your sample size inflates false positive rates dramatically. Commit to the target before starting the test.

One test at a time per page

Running multiple overlapping tests on the same page can cause interaction effects that make results uninterpretable.

Account for novelty effect

Run tests for at least one full business cycle (usually 1–2 weeks) even if you hit sample size sooner, to average out day-of-week traffic variation.

Worked Examples

E-commerce checkout button

Baseline conversion rate: 3.2%MDE: 20%Significance: 95%Power: 80%

You need ~3,800 visitors per variation (7,600 total). At 500 daily visitors split evenly, the test runs about 15 days.

SaaS trial-to-paid

Baseline conversion rate: 12%MDE: 10%Significance: 95%Power: 90%

You need ~5,100 visitors per variation. Lower MDEs and higher power requirements mean more data — this test needs a 2–3 week runway at typical SaaS traffic.

Advertisement

Frequently Asked Questions

What happens if I don't reach the required sample size?

Your results are unreliable. Even if p < 0.05, the confidence interval around your effect estimate is wide enough that the true effect could be negative. Under-powered tests are the leading cause of 'winning' variants that fail to replicate when rolled out fully.

Should I use one-tailed or two-tailed tests?

Two-tailed (the default in this calculator) tests for any difference — improvement or degradation. One-tailed tests only look for improvement and require smaller samples, but are only appropriate when you're certain the variant can't hurt. Most practitioners use two-tailed for safety.

What's a good baseline conversion rate to use?

Use your actual measured conversion rate from the past 30–90 days, filtered to the same audience segment you'll be testing. Avoid using benchmark rates — your traffic quality, product, and funnel are unique.

Can I test more than two variations?

Yes, but each additional variation requires the same sample size per arm, and you need to apply a multiple comparison correction (Bonferroni or similar) which effectively raises your significance threshold. Three variations at 95% significance requires correcting to ~98.3% per test.

Advertisement

Related Calculators