One of the things that came out of our Bot Obedience Course at SES San Jose last month was a wish that search engines somehow made it possible for site owners to know they were sending "trusted" or "certified" spiders. Now Google's suggested one way this can be done.
Those blocking rogue spiders through IP filtering run the risk that they might accidentally keep some of the "good" bots out. If you don't know all the Google IP addresses, there's a chance you might reject a Google spider accidentally. That might cause your pages to be dropped from Google.
How to verify Googlebot from Matt Cutts at the Official Google Webmaster Central Blog covers a suggested technique to avoid this. Basically, all Google spiders will report they are from the googlebot.com domain. So do a DNS lookup on the IP address. If it comes back as googlebot.com, then you're halfway there. Halfway? Yes, that's because people can lie about domain names. To avoid spoofers, you then have to look up the domain name you found to see if it matches the original IP range.
The blog post explains more, and it's going to make the most sense to tech-savvy webmasters that are implementing some type of IP filtering or blocking already. Not doing that? Then don't worry about this -- it's not really for you.
Down the line, perhaps we'll see less tech-savvy solutions come up, for those sites getting slammed by bad bots but without IP filtering. But this is a great start for now.
Matt's also mentioned this on his personal blog, where people are commenting on the technique.
Optimising Digital Marketing Campaigns with Search, Social and Analytics
At SES London (9-11 Feb) you'll get an overview of the latest tools, tips, and tactics in Paid, Owned, Earned, Integrated Media and Business Intelligence to streamline your marketing campaigns in 2015. Register by 31 October to take advantage of Early Bird Rates.