A/B testing is great for iteratively improving web applications. I have had loads of conversations with startups who are using it to test. However, they all seem to be making one fatal mistake – their sample sizes are too small.
An A/B test that uses a sample size that is too small is worse than no test at all. The figures just aren’t statistically significant, and if you had run the test longer the results may have reversed. This means that not only might you have the wrong result but you feel that you have proof in that wrong result. This results in in you defending your position later on despite contradictory evidence.
So how big does your sample size need to be? Unfortunately the answer is – ‘it depends’. Luckily it doesn’t get very complicated. The only thing it depends on is the gap between the winner and the loser. In general the smaller the gap the bigger of a sample size that you need.
Specifically divide the gap between your winner and loser in half and then square the result. The result needs to be bigger than your sample size to be statistically significant?
Below is a worked example where there is a 10% difference between A and B

So with a 10% difference between A and B you need a sample size of about 1500 to be statistically significant.
Let’s see how this works with a 2% difference between A & B

Now you can see that you need a sample size that is around 25 times bigger.
Tip: If you find your test needs a very high sample size (above about 20,000) then your users don’t really care much about your changes. Give up and test something more significant.








Most Recent Comments