Deleted “Sensitive” Information Still Available via Google

Heightened security concerns have led a number of organizations to remove “sensitive” information from their web sites, yet much of this information is still available, even to people with relatively modest searching skills.

Among those organizations removing content, the Federation of American Scientists’ (FAS) Web site removed diagrams and photos of U.S. intelligence facilities, according to an Associated Press article. A Washington Post story reported that
the Centers for Disease Control and Prevention removed a “Report on Chemical Terrorism,” which describes industry’s shortcomings in preparing for a possible terrorist attack.

The CDC report, as well as thousands of apparently deleted pages from the FAS web site, are still available via Google’s cache feature. Also available is a page with detailed information of where and how the FAS obtained some of the images and maps it formerly made available on its site.

A search on Google using the phrase “U.S. intelligence facilities” limited to the FAS web site returned more than 4,500 results. Clicking most result links resulted in “404 – not found” messages. This should be expected since the pages were taken down relatively recently and Google has not yet recrawled the site and removed the now dead links from its database.

However, clicking the link to “cached” copies of more than a dozen of these search results displayed the expected page. Most, if not all of the deleted pages will remain available until Google recrawls the site and removes the dead links from its index.

Google isn’t proactively removing content from its database, even for cases like the FAS site which was reported in major media outlets. “We’re not doing anything special,” said Google spokesperson David Krane. “The Net is changing so fast right now there’s just too much to stay on top of.”

Google provides a number of methods for removing pages from its cache, or preventing a page from being cached when the crawler first discovers it. For urgent situations, Google also provides an automated means for removing large numbers of pages from its cache.

Some of the content reported by the AP and Washington Post as being deleted is not available via Google, or any other search engine, for that matter. Most of this information appears to have been stored in web-accessible databases, making it part of the Invisible Web.

An example is the Environmental Protection Agency’s RMP*Info Risk Management Plan Database. This system allowed extensive searching of hazardous materials stored at thousands of locations by facility name, chemical name, or geographic location. The database’s search form is still available via Google’s cache, but attempting to search the database using the form simply returns a “404 – Not Found” message.

While the concerns of organizations who have removed content from the web are understandable, it’s unfortunate that the web no longer has the degree of openness it once had. What’s happening is not blanket censorship, because many organizations who have yanked web content continue to make sensitive information available at their offices to people showing appropriate identification.

But it’s just more difficult now. That said, clever searchers will find other ways to find the kind of detailed information no longer available on the FAS site. For example, TerraServer has a huge database of aerial and satellite images from more than 60 countries, allowing you to select and print high-resolution images sized as small as 1.96 square kilometers.

Gone are the days when you could find “anything” on the web. Chalk up another casualty of the “events” of September 11th.

Web Sites Pull Intelligence Data
An Associated Press story reporting on pages removed by the Federation of American Scientists, the National Imagery and Mapping Agency, The U.S. Office of Pipeline Safety and others in the wake of the terrorist attacks on the U.S.

Agencies Scrub Web Sites Of Sensitive Chemical Data
The Washington Post reports on how some federal agencies have been removing documents from Internet sites to keep them away from terrorists.

Intelligence data pulled from websites
The BBC reports on sensitive documents and reports have been pulled from websites across the internet following the 11 September attacks.

TerraServer is a huge, freely accessible database of satellite imagery that’s growing by additional 5,000 square kilometers each day.

How Google Works
A detailed look under the hood at all aspects of Google’s operation.


A longer, more detailed version of this article is
available to Search Engine Watch members.
Click here to learn more about becoming a member


Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

Semel still at square one
Business 2.0 Oct 9 2001 1:03PM GMT
Arab websites see traffic soar
Media Guardian Oct 9 2001 8:40AM GMT
Northern Light Starts Special Edition for Windows XP
Research Buzz Oct 9 2001 12:12AM GMT on the auction block
Business 2.0 Oct 8 2001 6:26PM GMT
Visionary lays into the web
BBC Oct 8 2001 3:45PM GMT
AOL using bugs, cookies to help target ads
Chicago Tribune Oct 8 2001 11:53AM GMT
Afghanistan, on 50 Websites a Day
Wired News Oct 8 2001 11:04AM GMT
Industry outlook nothing to Yahoo about
ZDNet Oct 8 2001 8:34AM GMT
Internet hums after Afghanistan strikes
ZDNet Oct 8 2001 8:34AM GMT
A Look at Hidden Websites
Business 2.0 Oct 8 2001 7:43AM GMT
FBI searches the Web for terrorist tracks
ninemsn Oct 7 2001 2:22PM GMT

Related reading

google ads conversion rates by industry
search industry news trends 2018