AB Test Calculator

Free AB test calculator to measure statistical significance, calculate sample size, and determine if your A/B test results are reliable for confident decision-making

Test Data
Enter your A/B test results

Control (A)

Variant (B)

Settings

80%90%95%99%

Two-sided: Tests for any difference. One-sided: Tests for improvement only.

Enter your test data to see results

What is AB Testing?

AB testing (also called split testing) is a method for comparing two versions of a webpage, email, or app to see which one performs better. You show version A to half your visitors and version B to the other half, then measure which drives more conversions. It's used by marketers, product managers, UX designers, and data analysts to make data-driven decisions instead of guessing what works.

When to Use AB Testing

Website Optimization

Test headlines, CTAs, layouts, or images to boost conversions

Email Marketing

Compare subject lines, send times, or content formats

Product Features

Test new features before full rollout to avoid costly mistakes

Pricing Changes

Determine if price adjustments improve revenue or hurt sales

The key to successful AB testing is statistical significance. You can't just run a test for a day and declare a winner. You need enough traffic and conversions to know if the difference is real or just random chance. That's where this AB test calculator comes in. It tells you if your results are statistically significant and helps you determine the right sample size for tests that'll give you reliable answers.

AB Test Result Categories

Result TypeP-ValueWhat It Means
Highly Significant< 0.01Very strong evidence. 99%+ confidence the difference is real.
Significant0.01 - 0.05Strong evidence. 95-99% confidence. Safe to implement.
Marginally Significant0.05 - 0.10Weak evidence. Consider running longer or increasing traffic.
Not Significant> 0.10No evidence of difference. Don't make changes based on this.

How to Use the AB Test Calculator

This AB test significance calculator has two modes. Use the Test Results tab to check if your completed test is statistically significant. Use the Sample Size tab to plan how long you'll need to run your test before starting.

Test Results Mode (Post-Test Analysis)

  1. 1.
    Enter Control Data - Input your control (version A) visitors and conversions. Visitors are everyone who saw the page. Conversions are people who completed your goal (signup, purchase, click, etc.).
  2. 2.
    Enter Variant Data - Input your variant (version B) visitors and conversions. Make sure you're comparing the same time period for both versions.
  3. 3.
    Set Confidence Level - Most marketers use 95% confidence (which means 5% risk of false positive). Use 90% for faster results or 99% for critical decisions.
  4. 4.
    Choose Test Type - Use two-sided if you want to detect any difference (better or worse). Use one-sided if you only care about improvement.
  5. 5.
    Check Results - The calculator shows if your test is statistically significant, the conversion rates, relative lift percentage, p-value, and confidence interval.

Sample Size Mode (Pre-Test Planning)

  1. 1.
    Enter Baseline Rate - Your current conversion rate as a percentage. If 5 out of 100 visitors convert, enter 5.
  2. 2.
    Set Minimum Effect - The smallest improvement you care about detecting. A 10% relative increase means going from 5% to 5.5% conversion rate.
  3. 3.
    Choose Effect Type - Relative (%) shows percent improvement (20% better). Absolute (pp) shows percentage point difference (1pp means 5% to 6%).
  4. 4.
    Enter Weekly Traffic - How many visitors you get per week. The AB test calculator uses this to estimate how long you'll need to run the test.
  5. 5.
    Review Duration - The calculator shows required sample size per variant and estimated test duration in weeks.

Pro Tips for Accurate AB Test Results

  • Don't peek early - Testing multiple times during your test inflates false positives. Wait until you reach your planned sample size.
  • Split traffic 50/50 - Don't test 80/20 splits. Equal traffic gives you the most statistical power for your sample size.
  • Run full weeks - Monday traffic behaves different than Saturday. Run tests in complete week increments to avoid day-of-week bias.
  • Test one thing - Change headline OR button color, not both. Multiple changes make it impossible to know what caused the difference.
  • Check implementation - Verify both versions load correctly before collecting data. Broken variants waste time and skew results.

Understanding the Statistical Formula

The AB test calculator uses a two-proportion z-test to determine if the difference between your variants is statistically significant. Here's how it works under the hood.

Key Formulas

Conversion Rate:

CR = Conversions ÷ Visitors

Relative Lift:

Lift = ((CRB - CRA) ÷ CRA) × 100%

Standard Error:

SE = √(CR × (1 - CR) ÷ n)

Z-Score:

Z = (CRB - CRA) ÷ √(SEA² + SEB²)

Why These Formulas Work

The z-score measures how many standard deviations apart your two conversion rates are. If they're far apart (high z-score), the difference is probably real. If they're close together (low z-score), it's likely random noise. The AB test calculator converts the z-score to a p-value, which tells you the probability of seeing this difference by pure chance if there's actually no real effect.

Example 1: Simple E-commerce Test

Scenario: Testing a new checkout button

  • Control (A): 10,000 visitors, 500 conversions
  • Variant (B): 10,000 visitors, 580 conversions

Step 1 - Calculate Conversion Rates:

CRA = 500 ÷ 10,000 = 0.05 (5.00%)

CRB = 580 ÷ 10,000 = 0.058 (5.80%)

Step 2 - Calculate Relative Lift:

Lift = ((0.058 - 0.05) ÷ 0.05) × 100% = 16%

Step 3 - Calculate Standard Errors:

SEA = √(0.05 × 0.95 ÷ 10,000) = 0.000218

SEB = √(0.058 × 0.942 ÷ 10,000) = 0.000233

Step 4 - Calculate Z-Score:

SEdiff = √(0.000218² + 0.000233²) = 0.000319

Z = (0.058 - 0.05) ÷ 0.000319 = 2.51

Result:

P-value = 0.012 (1.2%), which is < 0.05. This test is statistically significant at 95% confidence. The 16% lift is likely real, not random chance.

Example 2: High-Traffic Landing Page

Scenario: Testing headline copy

  • Control (A): 50,000 visitors, 3,250 signups (6.5% CR)
  • Variant (B): 50,000 visitors, 3,550 signups (7.1% CR)

Quick Calculation:

Relative Lift: (7.1% - 6.5%) ÷ 6.5% = 9.2%

Z-Score: 4.31

P-Value: < 0.0001

Result: Highly significant. With 100,000 total visitors, even small lifts like 9.2% become statistically significant. You can confidently implement the new headline.

Example 3: Edge Case - Small Sample Size

Scenario: Testing on low-traffic page

  • Control (A): 500 visitors, 25 conversions (5.0% CR)
  • Variant (B): 500 visitors, 35 conversions (7.0% CR)

Quick Calculation:

Relative Lift: (7.0% - 5.0%) ÷ 5.0% = 40%

Z-Score: 1.37

P-Value: 0.17

Result: Not significant (p > 0.05). Even with a 40% lift, the small sample size means you can't trust this result. You'd need about 3,850 visitors per variant to reliably detect this effect size.

Common Calculation Mistakes to Avoid

  • Confusing relative vs absolute lift - Going from 5% to 6% is a 20% relative lift (1pp / 5pp) but only 1 percentage point absolute lift.
  • Using wrong test duration - Don't compare control data from last month to variant data from this month. Run them simultaneously.
  • Including bot traffic - Filter out bots, crawlers, and internal traffic before entering data into the AB test calculator.

Interpreting Your AB Test Results

Getting results from the AB test significance calculator is just the first step. You need to understand what the numbers mean and what action to take based on your results.

Understanding Your Results

P-Value < 0.05 (Significant Result)

You've got a winner! The difference between your variants is statistically significant at 95% confidence. There's less than 5% chance this is random noise. You can confidently implement the winning variant. If you got a p-value of 0.02, it means there's only a 2% probability of seeing this difference by random chance if both variants were actually the same.

P-Value 0.05 - 0.10 (Marginally Significant)

You're in the gray zone. The difference might be real, but you don't have strong enough evidence yet. Consider running the test longer to collect more data. If you can't wait, proceed with caution and monitor results closely after implementation.

P-Value > 0.10 (Not Significant)

No winner detected. The variants perform similarly enough that you can't tell them apart with confidence. Don't make changes based on this test. Either keep the control or try a different, more dramatic variation that might show a bigger effect.

What Factors Affect Your Test Results?

Sample Size

More visitors = more reliable results. A test with 100,000 visitors can detect small 3% lifts, while a test with 1,000 visitors might miss even 20% lifts. Use our confidence interval calculator to understand the precision of your measurements.

Effect Size

Bigger differences are easier to detect. A 50% lift needs way fewer visitors to prove significance than a 5% lift. If your variants are too similar, you might never reach significance even with tons of traffic.

Baseline Conversion Rate

Lower baseline rates need more visitors. Going from 1% to 1.2% (20% relative lift) requires 4x more visitors than going from 10% to 12% (same 20% lift). Low-converting pages take longer to test.

Confidence Level

Higher confidence = longer tests. Using 99% confidence instead of 95% requires about 40% more visitors. Most teams use 95% as the sweet spot between rigor and speed.

Traffic Consistency

Inconsistent traffic patterns can skew results. If weekday traffic behaves differently than weekend traffic, run your test for complete weeks. Same goes for seasonal businesses during peak vs off-peak periods.

External Events

Holidays, promotions, outages, or news events can temporarily boost or hurt conversions. If something unusual happens during your test, consider restarting once things stabilize.

What to Do Based on Your Results

If You Have a Significant Winner (P < 0.05):

  • Implement the winning variant across your entire site or app.
  • Monitor performance post-launch for 2-4 weeks to confirm the lift holds up.
  • Document the test - record what you changed, why, and what lift you achieved.
  • Plan your next test - can you make the winner even better? Test a new element.

If Results Are Inconclusive (P > 0.05):

  • Check your observed power - if it's below 70%, you probably need more traffic.
  • Consider extending the test - run another full week or two to collect more data.
  • Try a bigger change - subtle tweaks are hard to detect. Test something more dramatic.
  • Keep the control - if variants perform the same, stick with what you have and test elsewhere.

Limitations of AB Testing

  • Can't test everything - Some changes (like complete redesigns) are too big for standard AB testing. Consider gradual rollouts instead.
  • Doesn't tell you WHY - The AB test calculator shows which variant won, but not why users preferred it. Combine with qualitative research for deeper insights.
  • Needs sufficient traffic - Low-traffic sites (under 1,000 weekly visitors) struggle to reach significance. You might need to test for months or accept less rigorous evidence.
  • Short-term only - AB tests measure immediate impact. A variant might win now but cause problems later (like damaging brand perception or hurting repeat purchases).

When to Consult a Statistical Expert

The AB test significance calculator handles standard scenarios well, but consider getting expert help for:

  • • Tests with more than 2 variants (multivariate testing requires different analysis)
  • • Sequential testing where you're checking results multiple times
  • • Very low conversion rates (under 0.5%) where normal approximations break down
  • • High-stakes decisions (like pricing changes) where you need Bayesian analysis too

Related Testing Methods and When to Use Them

AB testing isn't the only way to optimize your site. Different situations call for different approaches. Here's when to use each method.

MethodBest ForKey Difference
Standard AB TestTesting single changes (button color, headline, image)Simple two-version comparison. Easy to analyze with this AB test calculator.
Multivariate Test (MVT)Testing multiple elements simultaneously (headline + image + CTA)Tests all combinations. Needs 10x more traffic than AB tests.
Multi-Armed BanditOngoing optimization with continuous traffic allocationAutomatically sends more traffic to winners. Better for long-term campaigns.
Sequential TestingWhen you need to check results frequentlyAccounts for peeking. More complex math than standard AB test significance calculator.
Bayesian AB TestWhen you want probability statements (90% chance B beats A)Different statistical approach. Easier to explain to stakeholders.

Multivariate Testing (MVT)

MVT tests multiple elements at once. Instead of just headline A vs headline B, you test headline (A/B) + image (1/2) + button (X/Y) all together. This creates 8 combinations (2×2×2).

Use when: You have 50,000+ weekly visitors and want to find the best combination of elements.

Multi-Armed Bandit

Bandit algorithms dynamically shift traffic to better-performing variants during the test. If variant B is winning, it gets more traffic. This maximizes conversions during testing.

Use when: Running long-term campaigns where learning continues indefinitely.

Sequential Testing

Sequential tests let you peek at results multiple times without inflating false positives. The standard AB test calculator assumes you test once, but sequential methods adjust for continuous monitoring.

Use when: You need flexibility to stop tests early if you see a clear winner.

Bayesian AB Testing

Bayesian tests give you probability statements like "There's an 87% chance variant B beats control." Many find this easier to understand than p-values and confidence intervals.

Use when: Stakeholders struggle with frequentist statistics or you want probabilistic answers.

Which Method Should You Choose?

For 95% of situations, standard AB testing with this AB test significance calculator is your best bet. It's simple, reliable, and doesn't need massive traffic.

Only consider alternatives if you have specific needs. MVT for very high traffic sites. Bandit for ongoing personalization. Sequential for flexibility. Bayesian for easier stakeholder communication. Start simple, add complexity only when needed.

Frequently Asked Questions

How long should I run my AB test?

Run your test until you reach your planned sample size, which you can calculate using the Sample Size tab. Most tests need 2-4 weeks minimum to account for weekly traffic patterns. Don't stop early just because you see significance. The AB test calculator helps you determine required duration based on your traffic and expected effect size.

What's a good AB test conversion rate improvement?

There's no universal "good" number. A 5-15% relative lift is typical for successful tests. Some tests show 50%+ lifts (rare but possible with major changes). Others show 2-3% lifts that are still valuable at scale. The AB test significance calculator tells you if your lift is statistically significant, not whether it's "good enough" for your business.

Can I test more than 2 variants at once?

Yes, but each additional variant requires more traffic. Testing A vs B vs C splits your traffic three ways, so you'll need roughly 50% more total visitors to reach significance. This AB test calculator works for 2-variant tests. For 3+ variants, you'll need multivariate testing tools and adjusted statistical methods.

Why is my result different from other AB test calculators?

Small differences (0.001-0.01) are normal due to rounding or different formulas for edge cases. Bigger differences might mean one calculator is using one-sided vs two-sided tests, different confidence levels, or Bayesian vs frequentist methods. This AB test significance calculator uses the standard frequentist two-proportion z-test, which matches most industry tools.

What does statistical power mean in AB testing?

Statistical power is your probability of detecting a real effect if one exists. Most teams target 80% power. This means if your variant truly performs better, you have an 80% chance of detecting it. The Sample Size calculator shows how many visitors you need to achieve your desired power level.

Should I use 90%, 95%, or 99% confidence level?

95% confidence is the industry standard for most AB tests. Use 90% if you need faster results and can tolerate more risk (good for iterative testing). Use 99% for high-stakes decisions like major pricing changes or checkout flows where mistakes are costly. Higher confidence requires more traffic and longer tests.

What's the minimum traffic needed for AB testing?

You need at least 350-400 conversions per variant to get reliable results with typical 5-10% lifts. If your conversion rate is 2%, that's 17,500-20,000 visitors per variant, or 35,000-40,000 total. Sites with less traffic should focus on bigger changes that produce larger effects, or accept longer test durations.

Can I stop an AB test early if I see a clear winner?

Not with this AB test calculator. Standard frequentist tests require you to commit to a sample size upfront and test until you reach it. Stopping early (even if results look significant) increases false positives. If you need early stopping flexibility, use sequential testing methods instead, which account for multiple looks at the data.