One of the things that came out of our Bot Obedience Course at SES San Jose last month was a wish that search engines somehow made it possible for site owners to know they were sending "trusted" or "certified" spiders. Now Google's suggested one way this can be done.
Those blocking rogue spiders through IP filtering run the risk that they might accidentally keep some of the "good" bots out. If you don't know all the Google IP addresses, there's a chance you might reject a Google spider accidentally. That might cause your pages to be dropped from Google.
How to verify Googlebot from Matt Cutts at the Official Google Webmaster Central Blog covers a suggested technique to avoid this. Basically, all Google spiders will report they are from the googlebot.com domain. So do a DNS lookup on the IP address. If it comes back as googlebot.com, then you're halfway there. Halfway? Yes, that's because people can lie about domain names. To avoid spoofers, you then have to look up the domain name you found to see if it matches the original IP range.
The blog post explains more, and it's going to make the most sense to tech-savvy webmasters that are implementing some type of IP filtering or blocking already. Not doing that? Then don't worry about this -- it's not really for you.
Down the line, perhaps we'll see less tech-savvy solutions come up, for those sites getting slammed by bad bots but without IP filtering. But this is a great start for now.
Matt's also mentioned this on his personal blog, where people are commenting on the technique.
Know your Ambiguous Customer: Effective Multi-Channel Tracking
Wednesday, June 5 at 1pm ET - Learn why a move from the "batch and blast" email approach enables better conversations with your customers.
Register today - don't miss this free webinar!