Ever wondered about all the content you have floating around Google's organic search index? We're often consumed with analyzing the traffic we receive and the hopes of traffic we don't have yet to remember we are likely getting some impressions just outside the realm of mass search volume.
Aside from this, more importantly, do you have just the search critical pages showing in search engine results? Or might you be providing duplicate page content across multiple URLs, internal search result pages, and other site content which ideally isn't fit for organic user consumption?
Why is this important?
- You want search engines to show the most accurate content you have.
- If you can gain an understanding of what content is showing up in search results then you can gain an understanding of under-valuable content which is wasting crawl spend when bots crawl your site.
- A thorough analysis shows content that may be receiving some impressions a little further down the results and could warrant some extra attention to reap full referral value from them.
Google Indexation Analysis
From your Google Webmaster Tools account you will want to navigate to the Search Queries section. Next, choose the Top Pages tab.
I like to look at a full view of three months instead of one month as we will likely see more pages. By doing this my URL list rose from 2,000 to 3,000 URLs.
Next, download this list into Excel.
Reading Between the Lines
Now that you have the list exported to Excel, sort by URL. This allows you to do a few things, such as sort data alphabetically so that we may truly understand what is being indexed and showing in search results.
- Do you have a lot of low impression content that you haven't seen in analytics as it hasn't been driving traffic. Is this an area that should be of focus?
- Do you have content that shouldn't be seen by searchers but hasn't been robot.txt or meta robots tag excluded yet? This can include what Google might deem as "thin" content or pages not needing to be seen by searchers, such as internal search result pages. Do you want a web searcher landing on this page? Do you want this to serve as the first impression of your site?
- Are you finding the same page naming conventions across separate folders signaling duplicate content?
- Do you have PDF or DOC content ranking prevalently that you can't see in analytics as organic search traffic due to the fact that you can't place analytical tracking on these page types? Should these pages endure a transition to HTML versions?
- Do you have www and non-www versions showing up in search results?
- Are there pages/folder structures showing up in this list that you know are old page versions? If so, these may have incorrectly redirected to new page versions via a 302 redirect vs. a 301 redirect. A 302 redirect can continue to rank in Google for the historical page. It will pass the user on to the new content but as an SEO practice URL redirection should be made view 301 permanent redirects.
Bing Indexation Analysis
We have taken a look at Google indexation but cross-engine analysis can be important. Sometimes Bing will tell a different story from that of Google. Within Bing Webmaster Tools you will want to navigate to the Page Traffic section.
Essentially, we're doing the same thing here, but it has a little different feel.
We can actually take a six-month look at impression data vs. Google's three-month look (I would suggest a shorter review as there may be many 301ed pages in the six-month review time frame). We also satisfy the "two birds, one stone" feeling since this is the Bing index and also what is presented in Yahoo.
Once you've chosen your desired time period, export to Excel. Why do this again? Because you're likely seeing two different page counts from Google to Bing.
After you've exported both versions you can cross-reference them for similarities or unexpected gaps. The ability to export this data also gives you the ability to look at the indexation data from different perspectives such as pivot tables of indexation by folder.
Long ago we looked at indexation via a search operator of "site:" and perused page after page analyzing what Google and Bing showed as indexed content. Over time these engines have given us the ability to dive into indexation data with greater precision.
In the end, Google and Bing are helping us out, so let's help guide them to the site content you should be showing, creating better search results and a better first impression for all of your site visitors.