In Part 1 of this three-part series, which focused on SEO from the perspective of the search engine crawling experience, we explored the primary aspect of the crawl, index, rank methodology. This series focuses on these three fundamental elements of SEO.
Before we continue, a short refresher on the assumptions put forth in Part 1:
Search engines crawl, index, and rank web pages. SEOs should base their tactics on these primary activities. The conclusions for SEOs working at any level boils down to these essential facts:
- It’s all about the crawl
- It’s all about the index
- It’s all about the SERP
But, of course, Google and any search engine only exist for one reason: to build a business around pleasing its users. Therefore, we must constantly remember:
- It’s really all about the users
And also, because SEO is by nature a competitive exercise, and earning top rankings is at the expense of your competitors, we must note:
- It’s all about the competition
Knowing this, we can initiate several tactics based on each phase of activity, which as an outcome, can help to build an overall SEO strategy. This method ensures we’re pursuing SEO objectives according to the processes search engines currently employ.
Time to get your hands dirty (as if they didn’t get dirty enough already combing through log files).
Indexation is the next priority in this discipline, and duplicate content is far and away the largest issue to be addressed. It’s probably not an exaggeration to state that all large sites have some sort of duplication, either intentional or otherwise.
Ecommerce sites, with multiple paths to the same content, faceted navigations, and multiple verticals selling the same products, can be bad offenders. We experience that often with our work at large retailers, such as Zappos and Charming Shoppes.
Even more problematic are huge media sites, such as newspapers and major publishers. Marshall Simmonds and the team at Define direct SEO for The New York Times and other major publishers, and they struggle with extreme examples of duplicate content daily; it’s an unavoidable part of that type of SEO.
Having duplicate content will never cause a penalty, at least not an intentional, outward penalization of a site. Rather, it’s a filter that will be issued based on the existence of multiple documents with nearly identical or slightly varied content. It’s a fundamental problem of SEO.
Duplication will also impact the crawl experience, so this needs to be tightened up from that perspective as well. Having multiple versions of a document in the search indexes isn’t a favorable result.
Matt Cutts, in his recent dialogue with Eric Enge, confirmed the well-known existence of the “crawl cap” that will be placed on a site based upon its PageRank (internal Google scoring, not toolbar PageRank), and how duplicate content can present problems here:
“Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We’ll drop two out of the three pages and keep only one, and that’s why it looks like it has less good content… if you have a certain amount of PageRank, we are only willing to crawl so much from that site. But some of those pages might get discarded, which would sort of be a waste…”
The full interview with Cutts contains essential information for every serious SEO in the field. While much of it won’t be a surprise, it’s powerful to have confirmation on many of the issues we face daily.
Deep links, especially from a fundamentally strong site architecture, and from relevant and high-quality external sites, are the best way to improve both a site’s indexation and it’s crawl penetration.
Determining Indexation Issues
Determining a site’s crawl rate and crawl cap, then determining the amount of duplicate content cluttering up that budget, will improve the crawl experience, the indexation penetration, and the crawl efficiency. But how do you determine indexation quality on your site? There are a number of excellent ways to do this, including:
- Analyzing log files and/or analytics to track traffic levels by URL. Grouping these by vertical (such as Clothing, Bags, or however your site architecture and business is organized) will help show which sections of a site aren’t getting the love from search engines. This will reveal a problem in indexation.
- Analyzing the internal linking of a site. Which sections of the site have a minimal of internal links? Which sections are more than six or seven hops from the home page?
- Using searches such as site:jcrew.com inurl:72977 to find duplicates of product pages. For e-commerce sites, product-level duplicate pages are normally the worst offenders. Make use of
Pagination & Search Results
The problem of SEO and pagination is a complex topic. But briefly, it should be noted that many pagination issues can be resolved with a combination of “rel=canonical” and a default View All page that serves as the canonical version. This folds all page versions of a product line into the View All page; this practice was recommended by Maile Ohye this year.
Search results are another unique situation. There are many ways to handle search results.
One elegant way search results can be handled is by canonicalizing them to the default search page, which is then fortified to be a quality page with contextual, helpful links and small amounts of content. This isn’t always the best route, however, and is best used for “pure search” pages that don’t serve as a site’s navigation foundation, obviously.
Indexation Reveals URL Weaknesses
During investigation into indexation issues, any weaknesses in a site’s URL structure will come to light. This is especially true at the enterprise level, where you’ll find all kinds of unexpected ugliness in the search indexes. This is further exasperated by the frequent (and random) occurrence of new offenders – sometimes entire new duplicates of a site will surface without warning.
This is a side of effect of having many different types of stakeholders and business interests working within a company. Although we too often make the mistake, SEO isn’t the end-all, be-all to others that it is to us working actively in the field.
Indexation is a critical component of the crawl, index, rank equation, and normally gets a great deal of attention by SEOs. Clean up your index and provide consistent hints and directives to the search engines, and you’ll enjoy efficient crawling, prioritized indexation, and excellent SEO performance.
Continue reading: “Crawl, Index, Rank, Repeat: A Tactical SEO Framework (Part 3)“, is the final installment of this series, and covers ranking.