Well, that didn't take long. I wrote on Monday about how WebmasterWorld head Brett Tabke decided to ban all search spiders including those from the major search engines in an effort to combat bandwidth loss and server sluggishness due to rogue spiders. Brett figured he had about 60 days until he'd see pages get dropped. It took two.
As of this moment, site:webmasterworld.com at Google shows NO pages being listed from the site. Prior to the ban, about 2 million pages were listed. Oddly, Google's not even returning the site's home page using the listing out of the Open Directory.
In other words on Monday, as I recall, a search for webmasterworld brought up the WebmasterWorld home page with the title and description like this
Brett Tabke hosts professional webmaster and search engine promotion discussions.
A search today for the same thing doesn't bring up the site at all. Yes, WebmasterWorld banned Google from spidering it. However, that doesn't prevent Google from listing at least the home page by making use of the Open Directory information, which doesn't require spidering the WebmasterWorld web site.
Interestingly, checking the Google Directory -- which is powered by the Open Directory -- there is no listing for WebmasterWorld in the same exact category as you'll see at the Open Directory. It suggests that the robots.txt ban had the effect of pulling WebmasterWorld not only out of the Google web search results but Google Directory listings as well. That would be an entirely new thing I don't recall hearing happening before.
Checking with Dave Naylor, who's been watching the situation, he suspects that this is indicative of Google manually pulling everything about the site from Google.
Over at MSN, site:webmasterworld.com brings back one match, but since it lacks a title and description, this looks to be a listing of the WebmasterWorld home page based on the fact MSN sees links to it, rather than having crawled it. Google can and does do a similar thing, calling these "partially indexed" URLs. It's not doing that for WebmasterWorld, however.
To understand more about the entire situation of how a page that bans spiders might still appear, check out my The US White House & Blocking Search Engines page.
At Yahoo, site:webmasterworld.com shows 83,300 matches for me, which is steady from what I saw earlier this week.
Should the pages have dropped so quickly? With Google, things might have been helped along by the fact it has an automatic page removal system. Don't panic! It only works if a site has specifically put up a robots.txt file blocking Google. People just can't come along and remove your pages unless you yourself have installed such a robots.txt file.
Todd Freisen describes the system more in Blink And It?s Gone, and he's at least one person who submitted the new WebmasterWorld robots.txt file to speed up the removal process. Todd's also been tracking page counts for the site in various search engines: WebmasterWorld Index Watch 3, WebmasterWorld Index Watch 2 and WebmasterWorld Index Watch.
Even if this hadn't happened (submission to the automated Google page removal tool), I still thought it was way overly optimistic to assume a popular site like WebmasterWorld would be allowed to retain pages after expressly banning spiders. MSN certainly has no automated page removal system, and it matched Google in dropping pages.
Want to discuss or comment? Members at WebmasterWorld are talking in this thread, and we also have discussion starting at our own Search Engine Watch Forums in WebmasterWorld Off Of Google & Others Due To Banning Spiders.