IndustrySearch Engine Visibility and Site Crawlability, Part 1

Search Engine Visibility and Site Crawlability, Part 1

Huge databases that generate Web site content on the fly can be the bane of search engine spiders' existence. They can't find pages; they can't see URLs. So they can't index pages.

At the recent SES Chicago. Laura Thieme and Matt Bailey, both experienced presenters, showed how webmasters could tame the dynamic Web site beast and improve search engine visibility and site crawlability.

This is the first of two articles outlining the SEO problems they identified with dynamic Web sites and solutions. I’ll expand on both.

At the very highest level, dealing with large dynamic Web sites requires IT and marketing to collaborate. The person responsible for the technical search engine optimization is an important member of the team. Without the cooperation of technical SEO support staff, your Web site can end up poorly optimized for search engines.

Keyword Research and Deployment

Do your keyword research up front. You need to understand the language potential customers use to find the products or services you offer. You also want to understand related topics that interest these potential customers.

You then want to match up this understanding of the marketplace with the available content you have, or are willing to develop, and design a logical and clean hierarchy around that.

Laura said embedding keywords into your page names (i.e. the URI) is a powerful SEO tactic. In fact, she recommends existing sites embed the most important keywords for each page into their URI (and then 301 redirecting the old page location to the new page location).

Don’t use the same titles and headers for all the pages of your site. That’s another reason to do keyword research. The long tail of keywords enable you to create a different title and header for every page of the site.

Large sites also have issues with pages being found and indexed. Make sure the most important keywords for your site appear in the title and header for the home page.

Robots.txt Files

Matt pointed out that it’s surprising how many sites have problems with their robots.txt file. He provided a tutorial on how robot.txt files work. You can also find good tutorials on the Web.

I’ve seen many sites that have problems with Robots.txt. While it’s a powerful tool for directing the way search engine crawlers see your site, it’s easy to make a mistake. A single mistake can have catastrophic consequences. So use robots.txt, just use the file with great care.

To illustrate how easy it is to make a mistake, we worked with a client a while back that implemented new versions of their site on a staging server separate from the one used for the main site. This allowed them to look at the site live before launching it.

They didn’t want this duplicate site indexed by the search engines, so they implemented a robots.txt as follows:

User-agent: *
Disallow: /

Experienced readers are already cringing because they know where this is going. Unfortunately, one day during a site update they copied the whole site from the staging server to the live server, including the robots.txt file.

It took 3 weeks before anyone noticed, and that happened because traffic to the site was already crashing. Ouch. Again, use it, but be careful.

Site Maps

The Sitemap is an XML file that lists URLs for a site, as well as when each URL was last updated, how often it usually changes, and its importance relative to other URLs in the site. The protocol was designed to enable search engines crawl the site more intelligently. In principle, this should help the search engine find all of the pages on your site more easily than crawling it naturally.

Matt had a great point when he told the audience you’re better off letting your site being found naturally by the crawler. He referred to Sitemaps as a “fast track to the supplemental index”.

I couldn’t agree more. We don’t use Sitemaps on our clients’ sites.

The two best solutions for the crawlability problem — lots of pages not being indexed — the following two steps are the best solution:

  1. Build an efficient and clean global navigation system on your site. There is no better way to help a crawler find your site than a logical and simple navigation structure. If you do this, the crawler will find its way around.
  2. Get third party sites to link to your site. No on-site search engine optimization strategy will work unless you get these links.

Summary

We’ve outlined three key problem areas with sites that have dynamically generated content: information architecture and keyword research; robots.txt files; and the use of Sitemaps.

When search engines don’t index pages on your site, invest your time and energy in site navigation and inbound link development, rather than using Sitemaps protocols.

In Part 2, we’ll look at tools for analyzing your Web site structure and traffic, as well as more solutions to SEO problems with dynamically generated sites.

Eric Enge is the president of Stone Temple Consulting, an SEO consultancy outside of Boston. Eric is also co-founder of Moving Traffic Inc., the publisher of City Town Info and Custom Search Guide.

Search Headlines

We report the top search marketing news daily at the Search Engine Watch Blog. You’ll find more news from around the Web below.

Resources

The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

9m
Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

11m
The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

2y