Google, Yahoo, MSN, Ask Jeeves and other search spiders last month, is now
allowing them back in and thus returned to the land of the living, in terms of
being listed with search engines.
WebmasterWorld chief Brett Tabke gives his rundown on the situation more in
the site’s robots.txt file,
which he’s now using as a blog. C’mon Brett — you’re posting good stuff in
there beyond the whole robots things. Put the material into proper web pages, if
not an actual blog, so we can link to individual items.
Look close at that file, and you’ll see that it seems to still ban all the
robots. Now look here at
what the robots.txt file tells you is the "real" robots.txt file. That’s made
real to the major search spiders through this
checks to see if a spider is reporting a useragent from any major search
engines. If so, then a cloaked robots.txt file is sent to them.
Cloaked! Cloaked! You mean Google and gang are all anti-cloaking but they
don’t mind this cloaking? Apparently so, and not that surprising. The robots.txt
file really isn’t designed to be read by humans, though they can. So while
technically this is another example of search engines allowing cloaking, it’s
more a footnote than a big exception
as with things
like Google Scholar.
Ah, but what about people who might visit WebmasterWorld while pretending to
be one of the major spiders? How could you do that? Here, Greg Boser
points you at
one of many tools that let you do this.
Greg’s pointing at that because last week, he found himself
blocked from WebmasterWorld after surfing in there as if he was from Google.
He wasn’t alone in being caught by some detection stuff Brett’s setup, and now
he and others are back with access, as Greg
Found yourself in the same situation? Brett explains
here to send a sticky mail
to an admin to have access restored. I’m told from a good source that a number
of Google folks found themselves locked out as well, because many of them use
browsers that report the Google useragent.
What about the entire rogue spider thing? They were ignoring robots.txt in
the first place. That’s why, as I covered earlier, WebmasterWorld also set up
required logins to block the spiders. My understanding is that the major search
spiders are being excluded from this requirement, plus referring data is also
being used to help prevent some people clicking from the search engines from
getting a login request for the first two or three clicks.
In Google Index? has discussion at WebmasterWorld,
WMW – the bots are back has
discussion over at Threadwatch and
WebmasterWorld Off Of Google & Others Due To Banning Spiders our Search
Engine Watch Forums has older discussion and is a place also you can comment or
discuss the latest developments.