Split testing (in the context of digital marketing) is a powerful way to trial two or more options for an element of your campaign and be reasonably confident that the results are accurate (helpful is another matter).
Specifically, split testing involves grouping users into two (or more, but let's stick with two for now) buckets. Let's call them A and B.
When the user is exposed to your campaign the split testing system will randomly choose whether this user should be in pot A or pot B, and show them your campaign with your chosen settings.
The result is that so long as the choice of which version they saw was random enough, then you have eliminated all other variables that might lead to differences between the two users' behavior. Of course you haven't eliminated those variables for each user.
There are thousands of ways users can be different from each other. So you need a lot of results before you can be sure. You'll run your test over thousands of users.
If there are definitely no systematic biases (i.e., the choice is truly random and nothing other than chance decides which user enters which bucket), then you can be sure you've averaged away those differences. It's a great principle, but it can be harder to implement than we'd hope.
If we don't split test, we can't be sure of the results. It's common to run a time-series test. Essentially we run State A for a week then State B for a week and see which gave us better results.
I hope you're recoiling in horror at the suggestion.
What else changed between the first week and the second week? We don't know.
All kinds of things can impact how your campaigns behave: warm weather, Olympic games, royal weddings, news stories, payday, government rulings; all may impact your campaign in ways you can't predict or account for. So your test is bunk. You can't say for sure "my new bidding strategy improved my account" if other things might have improved it too.
So We're Back to Split Testing?
Yes, but it's still not plain sailing. Every test must start with a hypothesis.
You deliberately check the data to see if it disproves your hypothesis. If it doesn't, you live to test another day.
Common split tests in AdWords:
- Ad 1 vs. Ad 2
- Landing Page A vs. Landing Page B
- Short Contact Form vs. Long Contact Form
Setting these up relies on the Ad Rotation feature of AdWords. This feature allows you to choose whether Google performs ad split tests by evenly splitting traffic or using a multi-armed bandit model.
A few months ago Google decided to remove the even rotation option (or rather, they limited it to 30 days). This was met with much outcry - a fair reaction in my opinion. Multi-armed bandit (or "optimize for clicks" in AdWords vernacular) is a good principle, but can't (or at least shouldn't) be applied to every test.
Luckily for us Google brought back ad rotation. It might not be the best choice for advertisers on the whole, but it's certainly the better choice when we want to use it.
What I'm going to describe looks incredibly simple, but your account needs to be built with these principles in mind right from the start. The basic idea: optimize for clicks works best. Use it liberally.
Even use optimize for conversions if you have tons of conversion volume. (No, I'm not going to give you a number. Use your judgment. If you conversions through your main ads are greater than *insert large amount here* per month, then you can use it.)
Optimize for clicks isn't suitable for some of your ad tests. If you have a new ad style (e.g., theme, call to action, promotion) you can't rely on optimize for clicks to get it the traffic you need to be confident of your test.
A multi-armed bandit model like optimize for clicks will always prefer ads with history. The ads that have been running for a while will take the lions' share of the traffic. It might take you weeks or even months before your new ad has traction.
Solution: Have a dedicated campaign named "Split Testing" set to rotate ads evenly. Choose a high traffic ad group and copy it into here in its entirety.
During a split test, enable your ad group in this campaign and turn it off in your main campaign. The rest of your campaign continues to run on optimize for clicks (so you don't lose most of your juicy traffic, CTR and QS history) but your test area gets the traffic to each ad variant.
The campaign experiments tool in AdWords is a good start to helping make proper testing better, but at the moment it's still clunky to use and inappropriate for the most common tests we want to perform. It will give you the results of tests for statistical significance so use it where you can.
If you can apply campaign experiments to the tests you were going to perform anyway, that's great. But don't try to test ads in your main campaigns, they'll just prove inadequate at attracting traffic.
Image Credit: angusleonard/Flickr