Well, that didn’t take long. I wrote on Monday about how WebmasterWorld head
Brett Tabke decided to
ban all search
spiders including those from the major search engines in an effort to combat
bandwidth loss and server sluggishness due to rogue spiders. Brett figured he
had about 60 days until he’d see pages get dropped. It took two.
As of this moment,
site:webmasterworld.com at Google shows NO pages being listed from the site.
Prior to the ban, about 2 million pages were listed. Oddly, Google’s not even
returning the site’s home page using the listing out of the Open Directory.
In other words on Monday, as I recall, a search for
brought up the WebmasterWorld home page with the title and description like this
Brett Tabke hosts professional webmaster and search engine promotion
A search today for the same thing doesn’t bring up the site at all. Yes,
WebmasterWorld banned Google from spidering it. However, that doesn’t prevent
Google from listing at least the home page by making use of the Open Directory
information, which doesn’t require spidering the WebmasterWorld web site.
Interestingly, checking the Google
Directory — which is powered by the Open Directory — there is no listing
for WebmasterWorld in the same exact
category as you’ll see at the Open Directory. It suggests that the
robots.txt ban had the effect of pulling WebmasterWorld not only out of the
Google web search results but Google Directory listings as well. That
would be an entirely new thing I don’t recall hearing happening before.
Checking with Dave Naylor, who’s been
watching the situation, he suspects that this is indicative of Google
manually pulling everything about the site from Google.
Over at MSN,
site:webmasterworld.com brings back one match, but since it lacks a title
and description, this looks to be a listing of the WebmasterWorld home page
based on the fact MSN sees links to it, rather than having crawled it. Google
can and does do a similar thing, calling these "partially indexed" URLs. It’s
not doing that for WebmasterWorld, however.
To understand more about the entire situation of how a page that bans spiders
might still appear, check out my
White House & Blocking Search Engines page.
site:webmasterworld.com shows 83,300 matches for me, which is steady from
what I saw earlier this week.
Should the pages have dropped so quickly? With Google, things might have been
helped along by the fact it has an
automatic page removal system.
Don’t panic! It only works if a site has specifically put up a robots.txt file
blocking Google. People just can’t come along and remove your pages unless you
yourself have installed such a robots.txt file.
describes the system more in Blink
And It?s Gone, and he’s at least one person who submitted the new
WebmasterWorld robots.txt file to speed up the removal process. Todd’s also been
tracking page counts for the site in various search engines:
WebmasterWorld Index Watch 2 and
Even if this hadn’t happened (submission to the automated Google page removal
tool), I still
was way overly optimistic to assume a popular site like WebmasterWorld would be
allowed to retain pages after expressly banning spiders. MSN certainly has no
automated page removal system, and it matched Google in dropping pages.
Want to discuss or comment? Members at WebmasterWorld are talking in this
thread, and we also
have discussion starting at our own Search Engine Watch Forums in
WebmasterWorld Off Of Google & Others Due To Banning Spiders.