Search Engines Uncover Compromising Documents

Author

Chris Sherman

Date published August 18, 2003 Categories

Industry

Using a search engine and free software tools, it’s possible to dig up hidden — even deleted — information in documents posted to public web sites.

Many search engines allow you to restrict your search results to non-HTML documents, such as Microsoft office documents, PDF files, and others. In addition to the text stored in these files, these types of documents often contain other types of information not intended to be seen by users.

This information includes metadata such as author name, organization, editing history, and can also include custom data such as the names of document reviewers, who the document was received from, and so on.

In addition to this metadata, many programs also store recently deleted text, allowing you to “undo” unwanted changes. Using simple, freely available software tools, much of this hidden metadata and seemingly deleted text can be converted into visible plain text.

Simon Byers, an AT&T security researcher, used a search engine to find more than 100,000 Microsoft Word files on the web, including business documents and resumes. He then used the free software tools “antiword” and “catdoc” to convert them to plain text.

Byers found deleted text and information including names, email headers, network paths and text from related documents — potentially compromising information that people publishing the documents to the web likely did not realize was included.

Byers suggested that job seekers, in particular, may not realize that even if they delete their social security number from a resume posted to the web, that the number may still be included in the file and accessible to someone intent on identity theft.

The New Scientist has an excellent report on Byers’ research, which has been submitted for publication in the IEEE journal Security and Privacy.

If you post non-HTML documents to the web, how can you make sure potentially compromising information is not included?

The safest way is to convert the document to plain text, then paste the text into a new document. Then, use the “File, Properties” command to see what metadata has been included. This method isn’t foolproof — to be absolutely certain a document doesn’t contain information you don’t want revealed, publish it as a simple HTML file.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

‘Good’ worm hitting computers…
CNN Aug 19 2003 2:06PM GMT

MSN Search Tests Worrisome for LookSmart…
Boston.Internet.com Aug 19 2003 3:54AM GMT

Spam king shuts down…
Australian IT Aug 19 2003 2:26AM GMT

AOL 9.0 gets personal with subscribers…
CNET Aug 19 2003 0:15AM GMT

Music Group Won’t Sue Small Downloaders…
SiliconValley.com Aug 18 2003 10:37PM GMT

Overture a better buy than expected?…
CNET Aug 18 2003 10:16PM GMT

LookSmart’s Microsoft deal looks rocky…
CNET Aug 18 2003 8:52PM GMT

Overture improves ad tools…
CNET Aug 18 2003 6:53PM GMT

Google is most popular but others may do it better…
San Francisco Chronicle Aug 18 2003 2:15PM GMT

Microsoft search development threatens LookSmart figures…
Netimperative Aug 18 2003 12:55PM GMT

TOP 20: Search terms on MSN…
Netimperative Aug 18 2003 9:35AM GMT

The bubble that didn’t burst…
Guardian Unlimited Aug 18 2003 8:11AM GMT

Q&A – Resubmitting to the Search Engines…
About Web Search Aug 18 2003 5:14AM GMT

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Search Engines Uncover Compromising Documents

Search Headlines

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Solving the agency search intelligence gap

What to expect from SEO in 2021?

Search engine saturation: The ever evolving SERP and how brands are respond...

What's it like using DuckDuckGo in 2019?

Dragonfly: 500+ staff sign open letter for Google to drop new Chinese searc...

The evolution of search: succeeding in today's digital ecosystem - part 1

Search trends 2018: what can marketers learn?

SEW Interview: Clark Boyd on visual search

Follow us

Search Engines Uncover Compromising Documents

Search Headlines

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Solving the agency search intelligence gap

What to expect from SEO in 2021?

Search engine saturation: The ever evolving SERP and how brands are respond...

What's it like using DuckDuckGo in 2019?

Dragonfly: 500+ staff sign open letter for Google to drop new Chinese searc...

The evolution of search: succeeding in today's digital ecosystem - part 1

Search trends 2018: what can marketers learn?

SEW Interview: Clark Boyd on visual search

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.