IndustryGoogle As News Archivist

Google As News Archivist

Until recently, it was difficult to find news archives on the web more than 30 days old. That's changed in a big way with the advent of Google's News Archive search.

Until recently, it was difficult to find news archives on the web more than 30 days old. That’s changed in a big way with the advent of Google’s News Archive search.

Researchers and info pros know that everything is not available for free on the web, although sometimes it feels that way. One area in which it is evident that information comes with a price tag is news from more than a month or two ago. Sure, you can get the latest news from The New York Times from its web site, but you’ll have to pay $4.95 if you want to read anything from more than a few weeks ago. Google News and other news sites don’t offer any greater depth; if the news is older than a month, you’re out of luck.

But last September, Google rolled out its “200-year” News Archive Search, offering full-text content from The New York Times, Wall Street Journal, Washington Post, and third-party sources such as LexisNexis, HighBeam and Thomson Gale. You can search the News Archive at http://news.google.com/archivesearch, or by clicking the “News archive search” at the Google News page. If Google detects that your regular web search query would retrieve archived articles, it sometimes even includes those in the search results page.

The search screens for the News Archive are virtually the same as the regular Google News search; the only difference is that the Archive search includes the option to limit the results to free articles, or for articles costing less than $5, $10, or $50. (The $50-plus articles are from investment analysts, market research reports and other high-end content, sold through Alacra.)

The search results page differs from the usual Google search results format. The default is to sort the results by relevance, and there are hyperlinked dates along the left margin clustering the results by year. You can also see the results sorted by year by clicking the “Show timeline” link on the search page or at the top of any search results page. If there was a disproportionate number of mentions in one year or group of years (or, as Google says, “a period of particular interest”), that’s indicated by a blue arrow.

Although it isn’t well-documented, these date clusters do not include each year for which there are articles. For example, the search results page for “President Bush” include links to 2004, 2002-2003, 2001, 1992, 1988-1989 and “before 1988.” Obviously, there were articles in the search results from the other years. The only way to get a clustering of all years’ articles is to limit your search to a single news source ­ hardly a practical solution, and one that offers no advantage over going directly to a publisher’s archive and taking your chances on how far back in time the archive goes.

Note, too, that Google’s claim for a “200-year archive” is a bit, um, generous. Since most of the content of News Archive is from content aggregators, its depth is no greater than what content is available electronically. Content from ProQuest Archiver and a few selected sources goes back to the 1800s, but most of the material in News Archive is less than 5 or 10 years. If you want truly historical information, News Archive is a fine place to start — and often provides an interesting glimpse into what life was like 100 years ago — but don’t expect it to be comprehensive.

The search results page also shows hyperlinked links to the most frequent sources, but not all sources. For example, the search results page for articles from April 30, 2004 mentioning Australia only shows links to AAP General News, AAP Sports News, AsiaPulse News, Australian Banking & Finance and IDG Data. However, none the first five articles are from those sources. You can click on the link of any of the five sources listed to see results only from that source; as noted above, this also allows you to see the results clustered by year for each year in which the search terms appeared.

Once you click on any of the search result links, you are taken directly to the article. An interesting feature is that there appears to be a golden gap between what a newspaper will charge for from its own archive and what you have to pay for through an aggregator on News Archive. For example, through Google News Archive I viewed a 2004 archived article from the San Diego Union-Tribune at no charge, even though the same article would cost me $1.95 if I had retrieved it directly from the Union-Tribune archive.

Google News Archive still feels like beta. It is no competition to the fee-based online services such as Factiva or LexisNexis, but it can sometimes offer free access to articles within the last couple of years that you would have to pay for from a fee-based online service, or if you want to take advantage of its clustering to see when a topic was particularly hot.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

From The SEW Blog…

Headlines & News From Elsewhere

Resources

The 2023 B2B Superpowers Index

whitepaper | Analytics The 2023 B2B Superpowers Index

8m
Data Analytics in Marketing

whitepaper | Analytics Data Analytics in Marketing

10m
The Third-Party Data Deprecation Playbook

whitepaper | Digital Marketing The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

whitepaper | Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

1y