WebmasterWorld, which banned Google, Yahoo, MSN, Ask Jeeves and other search spiders last month, is now allowing them back in and thus returned to the land of the living, in terms of being listed with search engines.
WebmasterWorld chief Brett Tabke gives his rundown on the situation more in the site's robots.txt file, which he's now using as a blog. C'mon Brett -- you're posting good stuff in there beyond the whole robots things. Put the material into proper web pages, if not an actual blog, so we can link to individual items.
Look close at that file, and you'll see that it seems to still ban all the robots. Now look here at what the robots.txt file tells you is the "real" robots.txt file. That's made real to the major search spiders through this code, which checks to see if a spider is reporting a useragent from any major search engines. If so, then a cloaked robots.txt file is sent to them.
Cloaked! Cloaked! You mean Google and gang are all anti-cloaking but they don't mind this cloaking? Apparently so, and not that surprising. The robots.txt file really isn't designed to be read by humans, though they can. So while technically this is another example of search engines allowing cloaking, it's more a footnote than a big exception as with things like Google Scholar.
Ah, but what about people who might visit WebmasterWorld while pretending to be one of the major spiders? How could you do that? Here, Greg Boser points you at one of many tools that let you do this.
Greg's pointing at that because last week, he found himself blocked from WebmasterWorld after surfing in there as if he was from Google. He wasn't alone in being caught by some detection stuff Brett's setup, and now he and others are back with access, as Greg explains. Found yourself in the same situation? Brett explains here to send a sticky mail to an admin to have access restored. I'm told from a good source that a number of Google folks found themselves locked out as well, because many of them use browsers that report the Google useragent.
What about the entire rogue spider thing? They were ignoring robots.txt in the first place. That's why, as I covered earlier, WebmasterWorld also set up required logins to block the spiders. My understanding is that the major search spiders are being excluded from this requirement, plus referring data is also being used to help prevent some people clicking from the search engines from getting a login request for the first two or three clicks.
WebmasterWorld Back In Google Index? has discussion at WebmasterWorld, WMW - the bots are back has discussion over at Threadwatch and WebmasterWorld Off Of Google & Others Due To Banning Spiders our Search Engine Watch Forums has older discussion and is a place also you can comment or discuss the latest developments.
Meet Your Favorite Search Engine Watch Contributors
Many of SEW's leading expert contributors will be at ClickZ Live, the new online and digital marketing event kicking off in New York (March 31-April 3). Hear from the likes of: Thom Craver, Josh Braaten, Lisa Barone, Simon Heseltine, Josh McCoy, Lisa Raehsler, Greg Jarboe, Dan Cristo, Joseph Kerschbaum, John Gagnon, Eric Enge and more!