The Google Webcam “Hack” Story

Several stories and posts on the web today about Google providing access to unsecure webcams. A couple of quick comments:

1) A article quotes Duncan Parry saying that webmasters should protect pages from crawlers by using password protection and robots.txt files. Yes, these are good ideas. However, I’ll add a caveat. Only using a robots.txt MIGHT NOT keep the url completely out of the Google database. As I’ve said in the past and others have also pointed out, even if a webmaster utilizes a robots.txt tag on their server, Google might still include the urls (not the text) from the robots.txt blocked pages if they’re discovered via links on other pages. Limiting searches to inurl: can often reveal these types of pages. Of course, they can also show up with other types of queries.

In other words, just using robots.txt does NOT mean that the page or pages will not be found in the Google index. As Dan Brandt correctly points out, “filenames can be very revealing.”

Bottom Line: I think that the use of robots.txt would be another important topic for the summit that Danny proposed earlier today.

2) The same VNU article points out that Google is currently showing links to about 2000 cameras via this url.. However, even if you wanted to look at all 2000 it would be difficult. Google limits search result sets to about 1000 results.

Actually, when you get to the 208th result, the Google duplicate filter kicks in. If you turn the dupe removal filter off, you’ll see that many urls are for the same camera.

Related reading

Simple Share Buttons