This is the second of two articles about key search engine visibility and site crawlability issues for dynamically generated Web sites, organized into an integrated version of their presentations with additional insights. In part 1, we looked at three key problem areas with sites that have dynamically generated content: information architecture and keyword research; robots.txt files; and the use of Sitemaps. In part 2, we'll continue exploring more issues.
There are many potential technical problems you can end up with on your site. Here's a summary of the most common ones:
1. Complex URL Structure. If your URL looks like this, you're headed for trouble:
The key problem in the above URL is the number of parameters, delimited by ampersand characters. Complex URLs with a lot of parameters can cause crawlers to ignore the page altogether. Probably not the result you're looking for. These types of problems must be cleaned up.
2. Sometimes complex dynamic sites use redirects as a tool in managing site structure. Unfortunately, they often default to using 302 redirects. Search engine crawlers view the 302 redirect as temporary. That means that they don't pass on link credit from the old page to the new page. That's bad.
4. Duplicate content is also a huge issue on dynamic Web sites. Fundamentally, you want only one URL to reference a given document. Many CMS systems have big problems with this.
Worse still, the CMS systems sometimes actually reference pages on a site using more than one URL. This sometimes happens on a large scale resulting in large amounts of duplicate content on a site. Not good.
One of the more common duplicate content problems occurs when a search engine will refer to the home page of the site using the default document name (e.g. www.yourdomain.com/index.html), and use that form of the URL for the home page on the internal links to the home page.
Not only does this create duplicate content, it divides up the "link juice" of your site in a really bad way. This issue is important enough that I wrote about in more detail in a recent SEW Experts column, "SEO Hell, a CMS Production."
5. Most CMS systems do not handle the problem of canonicalization very well. This problem occurs when every page on your site can be accessed in both http://www.yourdomain.com format and in http://yourdomain.com format.
Search engines treat this as duplicate content. What makes this particularly bad is that most of your inbound links will go to http://www.yourdomain.com, but some will go to http://yourdomain.com.
The search engines are going to pick one version of your page, and the links to the other version of your page are simply wasted.
Fixing this is usually relatively easy. On an Apache Web server you can fix this in your .htaccess file using a scripting language called Mod Rewrite.
Like the Robots.txt file there's a strong potential for screwing up your site if you misuse this scripting language, so use it with great care. Make sure you have an experienced programmer doing this for you.
Site Analysis and Web Site Analytics
You're going to want to have insight into what's going on with your site. Here are a few key points to consider:
1. Web Analytics. Of course, you're going to have a Web analytics package in place. Google Analytics is sufficient for many sites, and will even work well for some dynamic sites.
Webmasters of complex sites often find they need something more powerful. There are many Web analytics packages out there. Some really good ones are:
Laura recommends tracking spider activity. She particularly likes NetTracker, a technology acquired by Unica and rolled into the NetInsight family of products.
2. Keyword Ranking. Laura likes to use Web Position Gold, a tool that scans the search engine results to determine where specific keywords rank in the engines.
Use tools like Web Position Gold with care. The search engines don't like automated rank checking programs using up their bandwidth. Laura recommends you check no more than twice per month.
3. Link Checking. It's smart to check out your site structure with an automated tool. This allows you to detect broken links on the site, and also can help you located instances of more than one URL referring to the same page. The tool I like to use for this is Xenu's Link Sleuth. It's free and can provide a ton of information about your site.
There are still other SEO problems with dynamic Web sites, but the ones pointed out by Laura and Matt, summarized and expanded upon in this series, are common ones I've seen over and over again.
Large dynamic Web sites can be a big headache. Tackle the problems outlined in these two articles, and you're likely to have little to worry about.
We report the top search marketing news daily at the Search Engine Watch Blog. You'll find more news from around the Web below.
- Local Search: The Rodney Dangerfield of Online Marketing, ClickZ Experts
- Take Social Networking a Step Further, ClickZ Experts
- AT&T Goes Nationwide with Ad-Supported Directory Assist, ClickZ News
- Google & Human Quality Reviews: Old News Returns, Search Engine Land
- SEO Is Not An Option, It's A Requirement!, Search Engine Land
- List of Regional Search and Interactive Marketing Associations, Online Marketing Blog
- 5 Typical Website Obstacles, SiteLogic
- Ten Paid Link Strategies Approved by Matt, Marketing Pilgrim
- Top 10 Most Influential Blog Posts for 2007, Cape Cod SEO
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!