SEO News
Search

Dynamic Pages And Search Engines

author-default
by , Comments

Crawler-based search engines often have problems indexing web pages that are delivered dynamically, such as via database programs. That can cause some sites to be essentially invisible to them. This article covers key points to be aware of if you have this type of content.

Jump to:
URL Problems - Testing Your Site - SSI & Extensions - Virtual Servers
Solutions - Apache - ASP - Cold Fusion - Lotus Domino - Miva Merchant
Paid Inclusion - Static Pages - Related Articles

URL Problems

Often, the problem is not that the page is dynamically delivered. Rather, the search engine may not like the URL used in order to retrieve the document. Many dynamic delivery mechanisms make use of the ? symbol. For example, a page may be found this way:

http://www.website.com/cgi-bin/getpage.cgi?name=sitemap

Some crawlers may not read past the ? in that URL. It acts as a stop sign to them. Instead, they will try to retrieve this:

http://www.website.com/cgi-bin/getpage.cgi

That won't retrieve a valid page, so nothing gets listed.

Other symbols that are often used in dynamic delivery systems include &, %, + and $. If your database program uses these, seek a workaround in order to make your content accessible. In general, the fewer parameters you pass, the less trouble a search engine will have with your URLs.

Some search engines may also avoid indexing URLs that have a reference to the CGI bin directory, such as:

http://www.website.com/cgi-bin/page1.htm
http://www.website.com/cgi/page1.htm

The reason search engines avoid these directories and URLs that pass along database parameters via the ? symbol is to avoid "spider traps." In some cases, a spider may stumble into a situation where the database or CGI process feeds it an infinite number of URLs. The spider keeps crawling, impacting it and the host server.

Testing Your Site

Uncertain if your content will be indexed? One way to get an answer is to submit some test pages to the search engines and see if they show up within about a month or so. If not, you may have a problem.

Another test is to find a big, popular site that you know is dynamically generated in the same way as your pages are. Find a subpage not too many levels down, then check to see if you can find this page listed in the various search engines. If not, you may have problems.

Server Side Includes (SSI) & File Extensions

The major search engines have no problems indexing pages that are built in part with SSI content. It also makes no difference if the pages end in the .shtml extension that some people use. However, there may be a problem if the pages use the cgi-bin path in their URLs.

As for file extensions, for the most part, search engines generally don't mind how your files end. In other words, even if your pages don't end in .html or .htm, they'll probably still get indexed -- assuming you've solved the ? symbol problem.

Virtual Servers

Server farms have offered virtual servers for some time. One machine hosts 50 or 60 different servers. Each server runs independently of each other. You pay a low fee, and you get a site with your own domain name and IP address.

For example, my consulting site is at http://calafia.com, or the IP address of http://128.121.115.121. Numerous other sites run on the same machine but are reached under different names. Search engines have no problems with sites like these.

Under HTTP/1.1, you can go a step further. I could have my IP address play host to several different domain names, such as both http://calafia.com and http://searchenginewatch.com, if I wanted. Thus, I could have several web sites, yet pay only one hosting fee.

The only problem is that the browser or search engine spider must be HTTP/1.1 compliant. If not, then only one of the virtual servers will be found. Generally, most major crawlers do appear to be compliant. When I've investigated trouble reports, it is usually that the web server itself isn't correctly configured.

The article below also touches on some virtual server issues:

Server Issues which could affect SE rankings
Searchengineposition, April 12, 2002

Tips on moving IP addresses, avoiding banned IP addresses, shared IP addresses and other server issues that might impact search engines.

Dynamic Page Solutions

There are workarounds that will let you create search engine-friendly URLs and still take advantage of a dynamically generated site. Look for these.

One site I know of made this simple change and gained over 600 visitors a day from a particular search engine, simply because its content was now able to be listed.

In contrast, I have seen many major web sites without a single page listed, because none of the search engines could crawl them. It's worth the time to see if your dynamic delivery problem has any easy solutions.

Below is a guide to several major dynamic delivery systems and possible products and solutions that may help, if you have search engine troubles with these.

Apache

Apache is popular web server software which has a special "rewrite" module that will allow you to translate URLs containing ? symbols into search engine friendly addresses.

The rewrite module (mod_rewrite) is not compiled in to the software by default, but many hosting companies add it anyway. For those that don't, it is an easy task to upgrade the server with the module.

Once installed, the rewrite module can take a URL like this:

http://www.shopguide.co.uk/guide.html?cat=Books

And make it also available in this format:

http://www.shopguide.co.uk/Books/index.html

Thanks to webmaster Martin Reeves of ShopGuide for this tip! See the links below for more information on how to use the module to solve your URL woes.

Active Server Pages (ASP)

These are pages delivered by Microsoft's server product. They usually end in .asp. Most of the major search engines will index these pages. Just avoid using the ? symbol, and you should be fine. Can't eliminate the ? symbol? OK, try the products below, which are specifically designed to make ASP sites search engine friendly:

Cold Fusion

These are pages delivered using Cold Fusion software. They usually end in .cfm. Normally, the database will use a ? symbol to retrieve pages. There are workarounds to this that will make your pages accessible.

A typical URL looks like this:

http://www.website.com/page.cfm?ID=A103

Cold Fusion can be reconfigured to use URLs like this:

http://www.website.com/page.cfm/A103

The page will load fine for both your users and for the search engines. To learn more, visit these threads (1 & 2) from the Cold Fusion support forums that discuss workarounds to eliminate the ? symbol from URLs and how Cold Fusion interacts with search engines.

Lotus Domino

If you're running Lotus Domino and find search engines ignoring your pages, the people at dotNSF say they can help.

Miva Merchant

Do you use Miva Merchant as your shopping cart? The newsletter below promises to help you learn how to properly optimize your system for search engines and get your category and product pages listed.

Search Engine Help For Miva Merchant
http://www.jmhonline.net/searchmiva

Paid Inclusion & Direct Submit

Directly submitting specific dynamic URLs to Yahoo increases the chances they will be picked up by that search engine. Also, of the major search engines, Google began increased addition of dynamic URLs as part of its normal crawls toward the end of 2000.

Several search engines offer "paid inclusion" programs where, for a fee, you are guaranteed to have the pages you want listed. These programs can usually handle dynamic URLs with no problem. The downside, of course, is that you have to pay.

The Ultimate Solution: Static Pages

Do you really need to generate your pages dynamically? Often, the database is simply used as a page creation tool. If so, use it to create static pages, especially for sections of your site that don't change often. Alternatively, consider creating a mirror of your dynamic content in static pages that the search engines can spider. The tools and companies below may help. All have offerings designed to make dynamic content friendly to crawlers.

Related Articles

For more on dynamic web site issues, see the SEO: Dynamic Web Sites category of Search Topics in Search Engine Watch.


ClickZ Live New York What's New for 2015?
You spoke, we listened! ClickZ Live New York (Mar 30-Apr 1) is back with a brand new streamlined agenda. Don't miss the latest digital marketing tips, tricks and tools that will make you re-think your strategy and revolutionize your marketing campaigns. Super Saver Rates are available now. Register today!

Recommend this story

comments powered by Disqus