Search Engines and Web Server Issues

A special report from the Search Engine Strategies 2003 Conference, August 18-21, San Jose, CA.

What is the best way to move a site from one server to another without affecting search engine visibility? What if your company decides to use a different domain name? After a site redesign, how can you communicate to the search engines that old URLs should be redirected to new URLs without affecting positioning? A panel of experts addressed these and other server-specific issues.

Giving Search Engines Access To Your Site

“It sounds really simple, but you must pay your ISP bill,” said Matt Cutts, software engineer at Google. “You will not believe some of the things I see that people do, or don’t do. If Googlebot cannot index your page at first, it will try back a few days later. Maybe if you have a server error, such as a 500-level error like your server is choking, Googlebot will wait for that.”

Since search engine visibility is crucial for most online businesses, web site owners need to ensure that their servers are functional 24 hours a day, 7 days a week. A search engine spider can visit a site at any time. If a search engine crawler requests a page from a server and that server is down for any reason, that page might not appear in a search engine index until the crawler visits the site again.

Checking your robots.txt file to ensure that you are allowing Google in to index your site, said Cutts. Check for typos, syntax, and missing spaces. “Everybody should have this file, even if it’s an empty file,” he said.

Cutts also recommended checking the syntax of your meta tags. “There are a lot of different ways you can use them,” he explained. “NOINDEX means, do not index the content on this page. NOFOLLOW means do not follow the links on this page. And NOARCHIVE has a special meaning for Google: don’t show the cached page only.”

There are other ways to control how easily search engines can crawl your site. “A site map is something people overlook all the time,” said Cutts. “Likewise, see how many links down you need to surf to get to deep pages on your site. The farther you have to go down, the harder it will be for Google to find those pages as well.”

Password protection is another way of insuring that search engines do not crawl sensitive content. “Sometimes we’ll get a letter from a bank or a university saying that they didn’t mean to leave private information up on the public Web where people and search engine spiders could find it,” said Cutts. “Password protection keeps that from happening.”

Static Vs. Dynamic Pages and URL Structure

Simplifying a site’s URL structure is another way to increase the likelihood that a search engine will crawl your entire site. Complex URLs that use a number of parameters (codes, typically following a question mark or other special symbol) pose particular challenges for search engines.

“If you have one parameter, you’re in pretty good shape,” said Bruce Clay, President of Bruce Clay LLC. “If you have two parameters, you’re fuzzy. If you have three or more, you’re not going to get indexed. I have yet to find any sort of content with three or more parameters from Google.”

“If your dynamic page has a fair number of parameters,” Clay further explained, “you’d probably be better off showing a NOINDEX tag in your dynamic content, and just indexing your static pages.”

Greg Boser, President of WebGuerilla, had a different viewpoint. “I would disagree just a little bit about whether or not you should rewrite the URLs,” said Boser. “Google does crawl dynamic stuff and they do it pretty well. But we’ve always found that they crawl URLs a lot better if they don’t know they are dynamic. It just makes a huge difference with how quickly your site is indexed. Bots tend to be a bit more timid with a dynamic site because of the possibility of a loop.”

Cutts explained the difference between a static URL and a dynamic URL. “When someone says dynamic URL or static URL,” he said, “they could really mean a couple different things. Typically when someone says dynamic URL, they mean a URL with a question mark in it. It could have one, two, three or more parameters. And the more there are, the more hesitant Google will be to crawl those pages.”

If a Web page is truly dynamic, it means that the page is created on the fly. Dynamic page URLs do not necessarily have a question mark in them. Likewise, static Web pages can have a question mark in the URL.

“The thing to be aware of is that Google and other search engine spiders are kind of alerted to that question mark,” said Cutts.

“Some spiders are more allergic than others, and some spiders will crawl anything with a question mark. If you can make it look as thought it were a static page — i.e., not have question marks in those parameters — we don’t care if they’re generated on the fly or not. The important thing is that it doesn’t have that cruft on the end (the parameter-question mark string), and that will really help your site qualify.”

The extension of files (.cfm, .asp, .php, etc.) does not make any difference on the ranking whatsoever. “As long as we get the pages,” said Cutts, “that’s all we need.”

Site Redesigns And Search Engine Indexing

If you have redesigned your site or updated it with new technology, there are ways of communicating to the search engines that the URLs have been modified. Webmasters can use a temporary redirect, which is a 302 command, or a permanent redirect, which is a 301 command.

“The 301 and 302 refer to those little status messages that appear any time a person or spider tries to retrieve the page. It’s something to be aware of if you’ve moved from an old domain to a new domain,” Cutts explained. “You’ll want to put a 301 redirect on the old domain to the new domain. It will make sure your visitors get redirected correctly to your new site, and it will make sure you still keep credit for your links on the search engines.”

“A 302 temporary redirect tells Googlebot, ‘Okay, go here for now, but try again later because it may not be that way later,'” he further explained. “If it’s going to be that way for good and in the same location, then do a permanent one — a 301 redirect.”

Sometimes, webmasters want to switch IP addresses. In order to do that effectively, make sure that both servers are serving up the content that the search engines ask for before completely making the switch. “At least for a day,” Cutts recommended.

Most search engines have a natural URL-removal tool that eliminates dead links from the search engine index. If you find a URL in a search engine index that is not supposed to be there, you can remove it yourself via a form without having to speak to a customer service representative.

According to Cutts, competitors will not be able to sabotage your site by attempting to remove pages through the URL-removal tool. “What we do is we check several times over the course of two to three days to make sure the page is completely gone on the site,” he explained. “Plus, we have safeguards in place to authenticate that you really did request the removal.”

Virtual Servers and Other Hosting Issues

Virtual servers are a common way to provide low-cost Web hosting services. With virtual hosting, instead of having a separate computer for each server, multiple virtual servers can reside on the same computer. Sometimes, if too many virtual servers reside on the same computer, or if one virtual server uses too many resources, Web pages can be delivered more slowly to the search engines.

“If your server doesn’t respond, we will try to refresh,” said Cutts. “We sometimes deal with slow servers from a foreign country to a really small pipe across the Atlantic, but overall it doesn’t matter where your site is hosted. Static vs. virtual hosting – it doesn’t matter. Just make sure is that Googlebot can get to the pages through your ISP.”

“Personally, I would never virtual host, but that’s just me,” said Greg Boser. “I know at some point we’re eventually going to have to do it, but for now as long as I can get individual IPs, I use them.”

“It’s not as nearly as bad as it was in the old days (’98 – ’99), when the bot didn’t always handle the HTTP 1.1 very well,” he further explained.

“When you are using a hosting company, you are really relying on them to make sure that the servers are set right. Sometimes, hosting companies can make mistakes at bad times of the month, when the bots are crawling them. It can cost you thousands of dollars. Your host goofs up, and you have to wait for Google to spider the next time around. So if you can afford to have separate IP addresses, do it.”

Some companies have a server with several Web sites directing to the same IP address. Would there by any advantage, right now, to separating those Web sites with their own IP addresses?

Clay recommended getting your own individual IP. “Currently there are free ones available. The fee is normally $2 a month, to get your own IP as opposed to being on a virtual server,” he said.

“If you have access to additional IPs for your site, that’s fine,” said Alex Shillington, Chief Technologist at 360i. “But if you don’t (through your hosting company), you want to be careful about what is being hosted on the other IPs. As long as virtual hosting is set up correctly, you should be all right.”

Google Webmaster Information

Google Remove URL Form

Robots Exclusion Protocol

Shari Thurow is the Marketing Director at Grantastic Designs, Inc. and the author of the book Search Engine Visibility. She has been designing and promoting web sites since 1995 for businesses in a wide range of fields.

Related reading

how to increase conversions: ideas, tools, and examples
content formats proven to maximize link acquisition for digital pr
luxury marketing search strategy, part one: consumer mindset
Google's RankBrain: Clearing up myths and misconceptions