Google: A Clear & Present Danger to Corporate Data Privacy

Date published 11 April 2008 Author

Kevin Heisler

Categories

Industry

UPDATE: Editors’ Note: At the request of Google, we’ve removed the photo of Google engineer Jayant Madhavan, co-author (with Alon Halevy) of the Google Webmaster Central blog post, Crawling through HTML forms, posted by Maile Ohye, Senior Support Engineer at Google. The photo was deleted at Google’s request to respect the privacy of Google’s corporate data and the personal privacy of Jayant Madhavan.
— Kevin Heisler, Executive Editor, Search Engine Watch

google%20logo%20med.jpg
A few hours ago, Google announced to the world that the company has been crawling forms on “high-quality” Web sites to index “Invisible Web” content in the Google.com search engine.

Google’s intention (as always) aims to improve the quality of search results for users of Google’s search engine.

Crawling Web site forms, though, constitutes a sea change in terms of data privacy; specifically, the privacy of corporate data.

“In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google,” according to Jayant Madhavan and Alon Halevy, from the Crawling and Indexing Team on an official Google blog.

Here’s how Googlebot does it, according to Google engineers:

“We might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.”

Last year, as the search marketing analyst for JupiterResearch, I said that the biggest issue in 2007 would be the threat to the privacy of corporate data.

I was wrong, 2008 is the year corporate IT departments worldwide will be forced to spend time, money and resources to ensure that search engine spiders do not inadvertently index data a company would prefer to be private.

The same holds true for non-profit organizations and other institutions.

From a personal standpoint, I have confidence in Google’s data security systems, despite the recent departure of Google CIO, Doug Merrill.

I have full confidence that Google practices “good Internet citizenship.”

I’m confident Google has paved the road to relevance with good intentions.

This is not simply a “pioneering move” by Google.

That the robotic filling-in of forms has already been practiced by AOL‘s Quigo, according to SearchEngineLand, does not reassure me.

I’m sorry, Sergey, Larry, Eric. I can’t in good conscience defend Google’s decision to our readers. The costs to CEOs, CIOs and CTOs at corporations far outweigh the benefits to consumers.

Please, reconsider.

Do not make the robotic querying of Web site forms the default spidering practice for Google. As a search engine, Google has become the gateway to the Internet and with great power comes great responsibility.

End this experiment now.

Stop this experiment before the backlash against Google develops. It’s not a question you want to answer when Wall St. analysts quiz you on the company’s performance on April 17th during the First Quarter earnings conference call.

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

Google: A Clear & Present Danger to Corporate Data Privacy

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Fospha Unveils the Ultimate TikTok Playbook for Ecommerce Success

Unlocking Brand Growth: Strategies for D2C and E-commerce Marketers

Nutrimuscle: Scaling spend and growing ROAS through better measurement

Snap Selects Fospha as Measurement Partner for Retail eCommerce

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Google market pulse for search marketers

2023, the year of SEO: why brands are leaning in and how to prepare