There isn't an SEO in existence that doesn't love crawling a site. There's something undeniably powerful in clicking a button and having all-important SEO elements brought to you. Makes the job easier.
What's hard is quantifying and prioritizing that crawl data, then applying it to the site in a way that makes sense to the client.
What follows is a workflow idea that begins not just with a site crawl, but with what most clients already know intimately, types of pages on their site. Focusing on benchmarking non-ranking URLs by page type, we'll use Google site search to help provide an actionable starting point and describe how crawling isn't necessarily the best first step when auditing a site.
Non-Ranking URL Page Type Workflow
We're all familiar with identifying page types, benchmarking, then tracking over time the URLs we want to see in search results. How about applying that same concept to URLs we don't want to see in search results? For the purpose of describing the image below, ranking URLs are what we want in search results versus non-ranking.
Consider the typical technical workflow:
- Crawl site for SEO elements and improvements.
- Try to quantify the scale (apply to page types) of the issue or provide a couple examples.
- Insert findings into client deliverable organized by common SEO issues.
With the typical technical workflow the site crawl is limited to internal linking, which means some URLs may not be found. Identifying scale and which page types the crawl findings apply to, are done secondarily. Finally, these findings are organized with a client deliverable structure that is unfamiliar to non-SEOs.
Now consider the page type workflow:
- Identifying and create comprehensive page type lists.
- Crawl lists for SEO elements and improvements.
- Insert into client deliverable organized by areas of the site the client is familiar with.
By first identifying and crawling comprehensive page type lists, scale is immediately apparent (number of paginated pages for example), existing SEO elements have been identified and benchmarked, a recommendation is given, and everything falls under the umbrella of an area of the site any client can easily understand.
Using Site Search to Identify Non-Ranking Page Types
There are many ways to begin identifying all page types on a site, but probably the easier, most widely accessible way is the ol' trusty Google site search. This is less comprehensive than using analytics (non-organic content views work great), but provides valuable indexation metrics along the way.
Costco Sort URLs Example
Quickly clicking through the Costco site, it's obvious that they use the sortBy parameter to invoke the sorting functionality for product category pages.
A quick site search shows just more than 8,500 URLs in Google index.
Using the Page Type workflow we've identified a page type, since we're using site search we know they're indexed, and changing search settings to show 100 results and using an SEO browser extension like SEOQuake results can be exported 100 at a time.
This list can then be run through an SEO Crawler, identifying the SEO elements on the pages and whether they have any directives or annotations. Based on these findings a recommendation can be created and finally inserted into the deliverable under the more or less client friendly section entitled sort URLs or URLs Generated by Sorting.
Google Drive Subdomain Example
Rather than identify page types by clicking manually through the site, we can use advanced site search to identify non-ranking URL page types.
Take a look at what might be considered the longest advanced Google search ever:
site:google.com/drive/ -site:docs.google.com -site:picasaweb.google.com -site:developers.google.com -site:support.google.com -site:documents.google.com -site:drive.google.com -inurl:ad_s
This also contains keys to what Google may need to clean up from an SEO standpoint for their Google Drive pages.
For this example we started by using the structure of the pages to be optimized.
Next, we removed subdomains by adding a minus sign. These may all need to be cleaned up or leveraged for Google Drive landing pages.
-site:docs.google.com -site:picasaweb.google.com -site:developers.google.com -site:support.google.com -site:documents.google.com -site:drive.google.com
URL with Tracking Parameter:
Each of these can be separately searched using similar advanced operator techniques to get an estimate on how many pages Google has indexed as a benchmarked, then checked again after the recommendation has been implemented to see track the effect. For example ideally drive.google.com requires sign in to access content, but Google has almost one million unique URLs for the subdomain. Perhaps a good recommendation would be they remove this subdomain from indices and can refer back to this advanced search as a benchmark.