A/B Testing is Not as Easy as Learning Your ABCs

One of the most difficult things to explain to upper management is just how problematic it can be to do A/B testing on SEO strategies on large sites. There are so many moving parts that could easily cause the test to be contaminated through the process.

The most commonly used strategy for A/B testing in large businesses is to take an entire category, or just a sampling of pages within the site, and make a few changes. The idea here is that you can take a small sample of the site with little to no harm and measure its change. This type of test will fail in every case, due to what I call the bleeding factor.

Since pages that may be as many as four or five clicks from the home page, rather than one or two clicks, often perform differently, you can never see the true results of your changes. The bleeding effect becomes most apparent when you have a group of pages that you have changed in a positive way, and neighboring pages connected with similar attribute links also boost up in ranking.

The effects of bleeding can be extremely difficult to understand, since it all depends on how relevant the bleed was; this will also vary based on scale and size of the site. For example, if you make a change to a category such as digital cameras, there will be a few brands that are compatible with other categories, such as Sony. Sony can be a compatible cross theme with televisions, CD players and DVD players, to name a few. Nikon, for example, will not have the same affect on televisions and CD players. Thus, the increase is seen with the related Sony categories, but cannot be shared with all other digital camera brands.

Another example of bleeding that is very difficult to measure is the impact on link building. Consider this scenario: if you negotiate a special link with a digital camera manufacturer, and point it to your main digital camera category page, what will be the bleeding effect of that one link? Let’s say this link will come from Nikon. Not only will it help the Nikon sub-category, but it will also help any of the other digital camera categories that you have as well. Now your test results of your link will be contaminated.

An alternative would be to have Nikon link directly to the Nikon sub-category. You would expect that the uplift would only be felt within the Nikon sub-category, but in fact, any decent site will link back to the top category from the sub-category page, pushing link weight back to the top. Additionally, the link weight passed to the primary category page would then pass to the other brands marked as a sub-category. Thus, your bleeding effect is felt in other categories as well.

Some companies will have multiple brands or sites that either have similar offerings or a different offering altogether. If the offerings are similar, there is a slight chance that accurate A/B testing can occur. However, if the sites are completely different, this type of testing will not be possible. A simple way of testing would be to change a category in some way and not to change it on the other site. Measure the impact and try and understand the effect.

On the other hand, if there is a lot of duplicate content involved, this test could be considered a failure, since one set of pages may just drop out of the index. If you have a business intelligence department on board, they will want to have a third site as a way to double check the results. This can be an effective way to measure if the search engine moved one site up and then took one site down in the process. Inevitably, we come back to the simple conclusion that A/B testing of SEO on large sites will likely drive you insane.

Do note that A/B testing based on groups of users can cause issues with the search engines, since they will flip pages every time they get spidered. Make sure that the software package you use can identify a search engine and not try to show different data each time the spider makes a request.

The best recommendation that I can provide is this: make small changes on entire page types at the same time globally, and measure the impact. If your site is at least 1 million pages or more, and if you have at least five pages types, the impact will not be that bad if it goes south. This is the most effective way of determining what types of major changes will be most beneficial for your site.

Related reading

17 best extensions and plugins that experienced SEOs use
Gillette video search trends
serps of tomorrow