How Valid Are Your Landing Page Test Conclusions?

Many landing page testers are surprised when the results of their initial test don't hold up if the test is rerun again.

Internet marketing produces a detailed and quantifiable view of your online campaign activities. Most of the numbers produced fall under the general category of descriptive statistics. Descriptive statistics produces summaries and graphs of your data that can be used for making decisions. The descriptive information has to do with the value of a particular quantity as well as its variability (how scattered it is).

Unfortunately, most people focus only on the measured average value and completely ignore the variability. This major problem continues to persist because people confuse the precision of the observed effects (the ability to measure conversions during the test), with the precision of the describing the underlying system (the ability to draw conclusions and make predictions about your landing page visitor population as a whole).

Generally, you shouldn't quote the observed improvement as a certainty. Even though you've observed an exactly computable conversion rate improvement percentage, you don't know what it really is for your visitor population as a whole. Exact measurement of observable effects doesn't imply that you know anything about the underlying process.

By itself, the mean of an observed value can be misleading, especially at small sample sizes. The situation gets even murkier if you're trying to model two separate means (each with its own variance and noise). The situation gets downright ugly if you're trying to compute a ratio of such numbers. Yet, this is exactly what is required to estimate a percentage improvement between two landing page versions.

I recently ran across a public case study of a landing page head-to-head test. Based on a sample size of 36 conversions out of 2,478 impressions for page A and 65 conversions out of 2,384 impressions for page B, you're told to conclude that the conversion rate improvement is 88 percent. I'd probably be happy with such a result, but let's take a closer look.

Let's assume that you want a 95 percent confidence in your answer. This corresponds to a statistical Z-score of 2, meaning that the number must fall within two standard deviations of the observed mean. If you compute the 95 percent confidence interval numbers on the number of conversions for both landing pages, you'll find the following:

• Page A: 36 ± 12 (the interval from 24 to 48)

• Page B: 65 ± 16 (the interval from 49 to 81)

Let's take a look at the best case scenario within our confidence range:

• Conversions: A = 24, B = 81

• Conversion rates: A = 0.97 percent, B = 3.40 percent

• Conversion rate improvement: 251 percent

Now let's take a look at the worst case scenario:

• Conversions: A = 48, B = 49

• Conversion rates: A = 1.94 percent, B = 2.06 percent

• Conversion rate improvement: 6.2 percent

There is some rationale for reporting the conversion rate improvement based on the ratio of the means. Since more of the mass of the normal distributions lies close to the mean, the actual numbers are more likely to be near it. However, this shouldn't be used as a reason to abandon the use of error bars or confidence intervals. Both the 6.2 percent and 251 percent conversion rate improvements are within the realm of possibility based on the confidence level that you had selected. There's a huge range of possible outcomes simply because the sample size is so small.

All online marketing educators are walking a fine line (myself included). We're trying to get at least a basic level of mathematical literacy across to our audiences. However, if the going gets too rough, many online marketers will just tune out and give up on the math altogether. I'm somewhat torn. Sure, "half a loaf is better than none," and it's good to use some kind of statistical benchmarks. Still, "a little knowledge is a dangerous thing," and it can be easily misapplied during landing page optimization.

My company has been guilty of oversimplifying. We often report public case study results as a simple percentage improvement. In our defense, the amount of data collected in a typical test is very high, and the consequent error bars are narrow. We also provide detailed statistical reporting and analyses of the results with error bars to our clients.

Bottom line: take the time and care to properly collect and analyze your data. When faced with uncertain measurements (basically all of the time), display them with error bars or confidence ranges.

Join us for SES San Jose, August 18-22 at the San Jose Convention Center.