Crawler-based search engines often have problems indexing web pages that are delivered dynamically, such as via database programs. That can cause some sites to be essentially invisible to them. This article covers key points to be aware of if you have this type of content.
Often, the problem is not that the page is dynamically delivered. Rather, the search engine may not like the URL used in order to retrieve the document. Many dynamic delivery mechanisms make use of the ? symbol. For example, a page may be found this way:
Some crawlers may not read past the ? in that URL. It acts as a stop sign to them. Instead, they will try to retrieve this:
That won't retrieve a valid page, so nothing gets listed.
Other symbols that are often used in dynamic delivery systems include &, %, + and $. If your database program uses these, seek a workaround in order to make your content accessible. In general, the fewer parameters you pass, the less trouble a search engine will have with your URLs.
Some search engines may also avoid indexing URLs that have a reference to the CGI bin directory, such as:
The reason search engines avoid these directories and URLs that pass along database parameters via the ? symbol is to avoid "spider traps." In some cases, a spider may stumble into a situation where the database or CGI process feeds it an infinite number of URLs. The spider keeps crawling, impacting it and the host server.
Uncertain if your content will be indexed? One way to get an answer is to submit some test pages to the search engines and see if they show up within about a month or so. If not, you may have a problem.
Another test is to find a big, popular site that you know is dynamically generated in the same way as your pages are. Find a subpage not too many levels down, then check to see if you can find this page listed in the various search engines. If not, you may have problems.
The major search engines have no problems indexing pages that are built in part with SSI content. It also makes no difference if the pages end in the .shtml extension that some people use. However, there may be a problem if the pages use the cgi-bin path in their URLs.
As for file extensions, for the most part, search engines generally don't mind how your files end. In other words, even if your pages don't end in .html or .htm, they'll probably still get indexed -- assuming you've solved the ? symbol problem.
Server farms have offered virtual servers for some time. One machine hosts 50 or 60 different servers. Each server runs independently of each other. You pay a low fee, and you get a site with your own domain name and IP address.
For example, my consulting site is at http://calafia.com, or the IP address of http://18.104.22.168. Numerous other sites run on the same machine but are reached under different names. Search engines have no problems with sites like these.
Under HTTP/1.1, you can go a step further. I could have my IP address play host to several different domain names, such as both http://calafia.com and http://searchenginewatch.com, if I wanted. Thus, I could have several web sites, yet pay only one hosting fee.
The only problem is that the browser or search engine spider must be HTTP/1.1 compliant. If not, then only one of the virtual servers will be found. Generally, most major crawlers do appear to be compliant. When I've investigated trouble reports, it is usually that the web server itself isn't correctly configured.
The article below also touches on some virtual server issues:
Server Issues which could affect SE rankings
Searchengineposition, April 12, 2002
Tips on moving IP addresses, avoiding banned IP addresses, shared IP addresses and other server issues that might impact search engines.
There are workarounds that will let you create search engine-friendly URLs and still take advantage of a dynamically generated site. Look for these.
One site I know of made this simple change and gained over 600 visitors a day from a particular search engine, simply because its content was now able to be listed.
In contrast, I have seen many major web sites without a single page listed, because none of the search engines could crawl them. It's worth the time to see if your dynamic delivery problem has any easy solutions.
Below is a guide to several major dynamic delivery systems and possible products and solutions that may help, if you have search engine troubles with these.
Apache is popular web server software which has a special "rewrite" module that will allow you to translate URLs containing ? symbols into search engine friendly addresses.
The rewrite module (mod_rewrite) is not compiled in to the software by default, but many hosting companies add it anyway. For those that don't, it is an easy task to upgrade the server with the module.
Once installed, the rewrite module can take a URL like this:
And make it also available in this format:
Thanks to webmaster Martin Reeves of ShopGuide for this tip! See the links below for more information on how to use the module to solve your URL woes.
- Apache Docs: mod_rewrite URL Rewriting Engine
- A Users Guide to URL Rewriting with the Apache Webserver
- Module mod_rewrite:Rewriting URLs With Query Strings
PromotionData.com, Nov. 26, 2002
How to use Apache's rewrite module to change URLs for SEO campaigns.
These are pages delivered by Microsoft's server product. They usually end in .asp. Most of the major search engines will index these pages. Just avoid using the ? symbol, and you should be fine. Can't eliminate the ? symbol? OK, try the products below, which are specifically designed to make ASP sites search engine friendly:
These are pages delivered using Cold Fusion software. They usually end in .cfm. Normally, the database will use a ? symbol to retrieve pages. There are workarounds to this that will make your pages accessible.
A typical URL looks like this:
Cold Fusion can be reconfigured to use URLs like this:
The page will load fine for both your users and for the search engines. To learn more, visit these threads (1 & 2) from the Cold Fusion support forums that discuss workarounds to eliminate the ? symbol from URLs and how Cold Fusion interacts with search engines.
If you're running Lotus Domino and find search engines ignoring your pages, the people at dotNSF say they can help.
Do you use Miva Merchant as your shopping cart? The newsletter below promises to help you learn how to properly optimize your system for search engines and get your category and product pages listed.
Search Engine Help For Miva Merchant
Directly submitting specific dynamic URLs to Yahoo increases the chances they will be picked up by that search engine. Also, of the major search engines, Google began increased addition of dynamic URLs as part of its normal crawls toward the end of 2000.
Several search engines offer "paid inclusion" programs where, for a fee, you are guaranteed to have the pages you want listed. These programs can usually handle dynamic URLs with no problem. The downside, of course, is that you have to pay.
Do you really need to generate your pages dynamically? Often, the database is simply used as a page creation tool. If so, use it to create static pages, especially for sections of your site that don't change often. Alternatively, consider creating a mirror of your dynamic content in static pages that the search engines can spider. The tools and companies below may help. All have offerings designed to make dynamic content friendly to crawlers.
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!