I’m often asked how long a landing page optimization test should run. The answer depends on these factors:
- The data rate (number of conversions per day).
- Size of improvements found (percentage improvement).
- Size of your test (number of alternative designs or “recipes”).
- The confidence in your answer (how sure you need to be).
Today, we’ll cover the first three factors.
The data rate describes how quickly you collect data during your test. The volume of traffic for landing page optimization tests is best measured in the number of conversion actions per day (and not the number of unique visitors). Web sites with low conversion rates require more visitors to reach valid statistical conclusions.
A significant portion of your testing bandwidth (typically between 15 and 50 percent depending on your circumstances) should also be directed to the original or control version of your Web site. This allows you to compare the performance of alternative recipes against a known baseline, even if that baseline continues to move around due to seasonal factors.
So what can you do if your data rate is too low and there are no additional traffic sources available? You can decrease the size of your test. In the simplest case, you may have to run a simple head-to-head test of your original page and one alternative version.
Another strategy is to measure different conversion actions. Sometimes, more plentiful measurable actions occur upstream of your current one. Because there are more of them, these intermediate actions can be used to bulk up your data rate.
For example, your e-commerce catalog may have too few sales, and your shopping cart abandonment rate is 90 percent. This implies that you have 10 times as many shopping cart “puts” as sales. This allows you to tune the main catalog experience up to the point that a visitor puts an item in their cart.
If you assume that the shopping cart abandonment rate doesn’t change, you can assign 10 percent of the average sale value on your site to each shopping cart put. You can then run your test and count the more numerous puts as the conversion action.
Size of Improvements Found
If you managed to uncover a clearly superior version of your landing page, the performance improvement would quickly become apparent. Often, an initial round of changes will fix some of the obvious problems and improve performance significantly. This will leave you with more subtle improvements in subsequent tuning tests.
The cumulative impact of several small improvements (between 1 and 5 percent) can still be very significant. However, it can take much longer to be able to validate these smaller effects to the desired confidence level. Because you don’t know the size of the possible improvements ahead of time, the length of the time required for the tuning test may vary significantly.
Of course, the amount of data collected also influences whether the difference found is considered significant. The list below shows the size of effects that can be reliably identified in a head-to-head test (to a 95 percent confidence level) at various sample sizes.
- 100 conversions — 20 percent effect
- 1,000 conversions — 6.3 percent effect
- 10,000 conversions — 2 percent effect
- 100,000 conversions — 0.63 percent effect
Resolving small effects requires a lot more data. You typically know your available data rate, and need to decide on an acceptable length of data collection for your test.
For example, let’s assume that you have about 500 conversions per month and are willing to spend two months on data collection. As a rough guide based on the list above, you’ll be able to identify 6.3 percent improvement effects in your head-to-head test. Any improvements smaller than that will be deemed statistically inconclusive.
Size of Your Test
The size of your test can be measured by the size of the search space that you’re considering. The search space is the whole universe of alternative designs possible in your test. A simple head-to-head test has a search space size of two (the original, plus the alternative landing page version that you’re testing).
If you’re testing multiple elements on the page, you need to multiply together the number of alternative versions for each one. For example, if you’re testing three headlines, four offers, and six button colors, then there are 72 possible versions (3 times 4 times 6 equals 72) in your test. As you increase the total number of elements and the number of alternatives for each one, the possible number of versions grows very quickly.
The amount of data required to reach conclusions scales with the size of your search space. Many testing approaches can’t practically be used in tests beyond a few dozen total recipes because they would require too much time to reach a reasonable confidence level.
Because you can control the size of your search space, it’s usually scaled to your data rate and the tuning method that you have chosen in order to complete in the allotted amount of time while still finding reasonable size effects.
Join us for SES San Jose, August 18-22 at the San Jose Convention Center.