Google On How To Let Googlebot In, Keep Bad Bots Out

One of the things that came out of our
Bot Obedience
at SES San Jose last month was a wish that search engines somehow
made it possible for site owners to know they were sending “trusted” or
“certified” spiders. Now Google’s suggested one way this can be done.

Those blocking rogue spiders through IP filtering run the risk that they
might accidentally keep some of the “good” bots out. If you don’t know all the
Google IP addresses, there’s a chance you might reject a Google spider
accidentally. That might cause your pages to be dropped from Google.

How to verify Googlebot
from Matt Cutts at the Official Google Webmaster
Central Blog covers a suggested technique to avoid this. Basically, all Google
spiders will report they are from the domain. So do a DNS lookup
on the IP address. If it comes back as, then you’re halfway there.
Halfway? Yes, that’s because people can lie about domain names. To avoid
spoofers, you then have to look up the domain name you found to see if it
matches the original IP range.

The blog post explains more, and it’s going to make the most sense to
tech-savvy webmasters that are implementing some type of IP filtering or
blocking already. Not doing that? Then don’t worry about this — it’s not really
for you.

Down the line, perhaps we’ll see less tech-savvy solutions come up, for those
sites getting slammed by bad bots but without IP filtering. But this is a great
start for now.

Matt’s also mentioned this on his personal blog, where people are

on the technique.

Related reading

youtube and child safety: is the service doing enough?
Google / YouTube and brand safety: What's next?
lessons learned from launching 100+ campaigns
Amazon Advertising, Prime Pantry