How to Understand Your Google “Not Provided” Traffic

google-analytics-penIf you didn't notice the Search Engine Watch article from October 19, 2011 discussing Google's decision to encrypt the referrer information for logged in users (including query string) you've most likely noticed the result: the ominous “not provided” most of us now see as our top referring search query.

We've all stared at it in frustration. This article won't discuss whether the move is good and fair or not. Rather, let's focus on how we can better understand these “not provided” users.

Google Personalized Visitors

Before I get to the main crux of this article, I want to cover a perspective my good friend David Harry brought up in a discussion on the subject. If you're ever wondering what percentage of your users are having their SERPs adjusted to personalized results, just look at what percentage of your traffic is “not provided”. These are logged in users and while Google can personalize non-logged in users, those signed in will be the most impacted and now you know how many there are.

The Problem With “Not Provided”

What bothered me for some time after the “not provided” data started appearing was knowing there was a way to break it down but not being able to quite put my finger on how. Fortunately that bothers me no more, and hopefully it won't bother you either after you're done reading.

By using the process outlined below, we can get a very good understanding of what the “not provided” data is in regards to what types of queries they're entering. Not exact, but good enough to make some solid extrapolations.

The assumption we need to make when working out a method for determining the breakdown of our “not provided” data is that logged in users are not the same as those not logged in. If 15.4 percent of our tracked users enter with a given phrase, that doesn't mean that the same percentage will apply to our “not provided” users. They're a different breed.

Understand What You Can Measure

The larger the data pool you have the the more specific you can be in regards to what you can pull from your “not provided” data with any accuracy. As with any data, the larger the pool, the more accurate collections will be (which is why political pollsters don't call 5 or 10 people from one city and assume the whole country will vote that way).

One of the most common breakdowns people look for in their data is their branded versus non-branded traffic. Let's start there. How do you determine what percentage of your “not provided” visitors have entered after searching branded versus non-branded phrases.

Determining Your Visitor Patterns

The first thing we need to do is figure out general differences in how our branded versus non-branded traffic behaves. To do this, simply follow these steps:

  • Enter Google Analytics for the domain you wish to monitor
  • Under “Traffic Sources” select “Organic” (under Search)
  • In the bar above the list of query phrases click “advanced”
  • It should default to “Include” > “Keyword” > “Containing” > “_____”. Enter your brand (in my case, "beanstalk") and click “Apply”
  • Record the following information: visits, pages/visit, time on site

So what we have is a basic breakdown of our branded traffic. Now let's get our non-branded information:

  • where once the word “advanced” was you'll now see “edit” (since we already have a filter on). Change the ”Include” to “Exclude” so instead of getting data for all your branded traffic, you're excluding it.
  • In this instance we also have to exclude all the “not provided” traffic. To do this, once you've changed the “Include” to “Exclude” click “Add a dimension or metric”.
  • Under “dimensions” select “Keyword” • Change “Include” to “Exclude”
  • In the text box enter “not provided”
  • Record the following information: visits, pages/visit, time on site

Let's compare a sample of data from a site I work on:

Branded:

  • pages/visit - 3.22
  • time on site – 4:11

Non-branded:

  • pages/visit – 1.61
  • time on site – 1:20

If we assumed that all traffic was equal we'd assume that 3.674 percent of their traffic was branded (determined by establishing what percentage of their traffic was branded when we remove the “not provided” data from the known queries). In fact, their logged in traffic bares little in common with their not logged in.

If you're curious, the formula for finding your assumed branded percentage if all traffic behaved the same would be:

(branded traffic times 100) divided by (all traffic minus “not provided” traffic)

We can see above that the branded versus non-branded traffic segments behave very differently. While we know that some of the branded traffic may well stay for less than a minute and some non-branded traffic may view many pages – overall the traffic patterns are very different.

We'll now use this information to segment our “not provided” data. To do this first go to the “not provided” keyword data by clicking it in the keyword list after removing all your filters. Now follow these steps:

  • In the top Analytics navigation under “Standard Reporting” click “Advanced Segments”
  • Click “new custom segment”
  • Leave “Include” and change the second drop down to “Pageviews”
  • Change the third drop down to “Greater than” and enter 2.41 into the text box (the number 2.41 is the mid-point between the branded and non-branded pageviews)
  • Click “Add 'AND' statement”
  • Leave “Include” and change the second drop down to “Time on site”
  • The third drop down should be “Greater than”. Now enter 2.45 into the text box (the number 2.45 is the mid-point between the branded and non-branded pageviews)
  • Now save the segment as “branded”

Now we need to create another segment that's exactly the opposite using the following steps:

  • In the top Analytics navigation under “Standard Reporting” click “Advanced Segments”
  • Click “new custom segment”
  • Leave “Include” and change the second drop down to “Pageviews”
  • Change the third drop down to “Less than” and enter 2.41 into the text box
  • Click “Add 'AND' statement”
  • Leave “Include” and change the second drop down to “Time on site”
  • The third drop down should be “Less than”. Now enter 2.45 into the text box
  • Now save the segment as “non-branded”

You should now be viewing a graph and information with a comparison of what our likely breakdown of branded versus non-branded traffic looks like. If you're not seeing this, simply click the “Advanced Segments” tab and when the segments are listed (yours to the right side) simply check the two segments you've just created.

One thing you'll notice is that if you add the non-branded and branded traffic percentages they don't total 100 percent. This is because we haven't just included one metric but rather two. Traffic that visited only 1 page but stayed for 7 minutes isn't included as it doesn't fit either segment. This gives us a better statistical sample but one more step to go through.

You'll have to multiply the branded visitors by 100 and then divide by the total number of visitors tracked. If our data showed 100 branded visitors and 500 non-branded I would multiple 100 by 100 and then divide by 600 (100 + 500) to yield 16.67 percent.

In the case of the site I've been referencing – this math shows that “not provided” data is likely to be 19.07 percent branded. This is significantly above what one would expect looking at just the general stats and assuming they translate directly.

In this case it actually makes good sense. The site is related to Internet marketing and it makes sense that people searching for it by brand are more likely to be logged in than the average. When the same methods were applied to a wide range of sites the results varied but were consistent with what logic and additional data points (conversions for example) provide.

More Than Just Brands

Perhaps more importantly, the same basic methods can be applied to other phrases (replacing your brand with another keyword segment – in my case, “seo” is a logical option – determining what percentage of our traffic is from “seo”-based phrases. You may be inclined to attempt to over-analyze your data. However, the closer the core metrics (pageviews for example) are, the less accurate your conclusions will be.

I tend to group data to allow for larger datasets to be collected. In the case of the Beanstalk site I analyze my “services” and “firms” and “company” data collectively as it gives me a larger pool of information and tends to act similarly. You may need to group your data in various configurations to find all the different patterns you can measure. But I guarantee you'll find some interesting things out about your visitors as you do so.

There is also more to the “Advanced Segments” that I covered above. You can segment by virtually every metric in Google Analytics.

“Playing” around with these different data points will give you more filtering capabilities. Even if you don't use it for segmenting your logged in versus logged out traffic, you'll undoubtedly gain valuable knowledge about your visitors from it. And isn't that what this is all about?

About the author

Dave Davies is the CEO of Beanstalk SEO Services, an organic SEO firm out of Victoria, BC, Canada. He writes with over a decade of experience in SEO and Internet Marketing.

He is an industry writer, reporter and speaker who wrote the second edition of SitePoint's SEM Kit, hosts a weekly radio show on Webmaster Radio and has spoken at a number of Search Engine Strategies conferences on topics ranging from ranking on all three major engines to Google patents and Net Neutrality.

Dave got his start in Internet Marketing in 1999 working for a Canadian web hosting company. Like many industry professionals - it didn't take long to connect the dots and figure out that it's easier to convert a client who comes to you than to find them yourself and what better what to do that than the organic results. Dave went from optimizing a single site to working as an affiliate marketer to then becoming the Marketing Manager for another successful SEO firm. From there it didn't take long for him to launch Beanstalk with his wife Mary.