One of the things that came out of our Bot Obedience Course at SES San Jose last month was a wish that search engines somehow made it possible for site owners to know they were sending "trusted" or "certified" spiders. Now Google's suggested one way this can be done.
Those blocking rogue spiders through IP filtering run the risk that they might accidentally keep some of the "good" bots out. If you don't know all the Google IP addresses, there's a chance you might reject a Google spider accidentally. That might cause your pages to be dropped from Google.
How to verify Googlebot from Matt Cutts at the Official Google Webmaster Central Blog covers a suggested technique to avoid this. Basically, all Google spiders will report they are from the googlebot.com domain. So do a DNS lookup on the IP address. If it comes back as googlebot.com, then you're halfway there. Halfway? Yes, that's because people can lie about domain names. To avoid spoofers, you then have to look up the domain name you found to see if it matches the original IP range.
The blog post explains more, and it's going to make the most sense to tech-savvy webmasters that are implementing some type of IP filtering or blocking already. Not doing that? Then don't worry about this -- it's not really for you.
Down the line, perhaps we'll see less tech-savvy solutions come up, for those sites getting slammed by bad bots but without IP filtering. But this is a great start for now.
Matt's also mentioned this on his personal blog, where people are commenting on the technique.
This Year's Premier Digital Marketing Event is #CZLSF
ClickZ Live San Francisco (Aug 11-14) will bring together the industry's leading online marketing practitioners to deliver 4 days of educational sessions and training workshops. From Data-Driven Marketing to Social, Mobile, Display, Search and Email, the comprehensive agenda will help you maximize your marketing efforts and ROI. Register today!