Until recently, it was difficult to find news archives on the web more than 30 days old. That’s changed in a big way with the advent of Google’s News Archive search.
Researchers and info pros know that everything is not available for free on the web, although sometimes it feels that way. One area in which it is evident that information comes with a price tag is news from more than a month or two ago. Sure, you can get the latest news from The New York Times from its web site, but you’ll have to pay $4.95 if you want to read anything from more than a few weeks ago. Google News and other news sites don’t offer any greater depth; if the news is older than a month, you’re out of luck.
But last September, Google rolled out its “200-year” News Archive Search, offering full-text content from The New York Times, Wall Street Journal, Washington Post, and third-party sources such as LexisNexis, HighBeam and Thomson Gale. You can search the News Archive at http://news.google.com/archivesearch, or by clicking the “News archive search” at the Google News page. If Google detects that your regular web search query would retrieve archived articles, it sometimes even includes those in the search results page.
The search screens for the News Archive are virtually the same as the regular Google News search; the only difference is that the Archive search includes the option to limit the results to free articles, or for articles costing less than $5, $10, or $50. (The $50-plus articles are from investment analysts, market research reports and other high-end content, sold through Alacra.)
The search results page differs from the usual Google search results format. The default is to sort the results by relevance, and there are hyperlinked dates along the left margin clustering the results by year. You can also see the results sorted by year by clicking the “Show timeline” link on the search page or at the top of any search results page. If there was a disproportionate number of mentions in one year or group of years (or, as Google says, “a period of particular interest”), that’s indicated by a blue arrow.
Although it isn’t well-documented, these date clusters do not include each year for which there are articles. For example, the search results page for “President Bush” include links to 2004, 2002-2003, 2001, 1992, 1988-1989 and “before 1988.” Obviously, there were articles in the search results from the other years. The only way to get a clustering of all years’ articles is to limit your search to a single news source hardly a practical solution, and one that offers no advantage over going directly to a publisher’s archive and taking your chances on how far back in time the archive goes.
Note, too, that Google’s claim for a “200-year archive” is a bit, um, generous. Since most of the content of News Archive is from content aggregators, its depth is no greater than what content is available electronically. Content from ProQuest Archiver and a few selected sources goes back to the 1800s, but most of the material in News Archive is less than 5 or 10 years. If you want truly historical information, News Archive is a fine place to start — and often provides an interesting glimpse into what life was like 100 years ago — but don’t expect it to be comprehensive.
The search results page also shows hyperlinked links to the most frequent sources, but not all sources. For example, the search results page for articles from April 30, 2004 mentioning Australia only shows links to AAP General News, AAP Sports News, AsiaPulse News, Australian Banking & Finance and IDG Data. However, none the first five articles are from those sources. You can click on the link of any of the five sources listed to see results only from that source; as noted above, this also allows you to see the results clustered by year for each year in which the search terms appeared.
Once you click on any of the search result links, you are taken directly to the article. An interesting feature is that there appears to be a golden gap between what a newspaper will charge for from its own archive and what you have to pay for through an aggregator on News Archive. For example, through Google News Archive I viewed a 2004 archived article from the San Diego Union-Tribune at no charge, even though the same article would cost me $1.95 if I had retrieved it directly from the Union-Tribune archive.
Google News Archive still feels like beta. It is no competition to the fee-based online services such as Factiva or LexisNexis, but it can sometimes offer free access to articles within the last couple of years that you would have to pay for from a fee-based online service, or if you want to take advantage of its clustering to see when a topic was particularly hot.
NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.
From The SEW Blog…
- Tracking All The Official Google Blogs
- Web Increasingly Used For Local Service Business Searches
- New Sitemaps For Google News
- Google Breaks $500 Per Share
- Steve Berkowitz Of Microsoft (Windows Live) News.com Interview
- Answers.com Integrates Yahoo Answers Into Content
- Google Beats Microsoft, Yahoo As College Grad Choice
- Can Developers & API Save Yahoo From Its Peanut Butter Crisis
- Norway Upset With Google News Over Copyright Laws
Headlines & News From Elsewhere
- Here today and gone tomorrow! Where did Google Click to Call go off to?, Understanding Google Maps & Yahoo Local
- Google v. Yahoo! Predicting the Intercept, Hitwise
- Podcast: Google Passes $500 Per Share Mark; Yahoo’s API Peanut Butter; Google News Adapting To Publisher Concerns In Scandinavia & More!, Daily SearchCast
- Social Media Caps The Search Summit, MediaPost
- Woz comes to Google, Official Google Mac Blog
- Ted Leonosis – Uber SEO – Posterboy for Reputation Management, Stuntdubl
- November 21, 2006 – ComparisonEngines.com Morning Podcast, ComparisonEngines
- Google Mapping an Offline Course, New York Times
- New Kid-Friendly Search Engine to Avoid: Zoo, ResearchBuzz
- New Eye Tracking Report Looks at Google, Yahoo & MSN, Marketing Pilgrim
- Nissan Revs up Marketing across Microsoft Properties, ClickZ
- Search as a Starting Point for Science Queries, ClickZ
- Paid Search Delivers Best Bang for Buck, iMedia Connection
- Advertising Placements by Industry and Top Sponsored Links, October 2006, ClickZ
- Optimizing Blogs for the Search Engines, ClickZ
- Are You Chasing the Wrong Long Tail?, ClickZ
- Why We Need Search Standards, iMedia Connection
- Gmail POP Troubleshooter Utility Released, Daggle
- Yahoo! Search Marketing 2.0 Limits To 20 Campaigns Per Account, Search Engine Roundtable
- Fake News Story Games Thousands of Digg Users, Micro Persuasion
- The “problem” of social news…, Jason Calacanis
- New Beta from Microsoft: Certain SMS Search Features Now Available from Windows Live Search, ResourceShelf
- Does Registering A Domain Name for 10 Years Help Search Ranking?, Search Engine Roundtable
- Where Does Google Draw the Data Collection Line?, SEOmoz Blog
- Webshots Adds Video Sharing, Search Engine Journal
- Online Information 2006, Phil Bradley
- Doing down Martin Luther King, Phil Bradley
- Martin Luther King Google Bomb, InsideGoogle
- Google-bombing (or reversing the damage, anyway), Robert Scoble
- Track YouTube Stats, Chris Pirillo
- Picasa Web Albums Adds More Storage Options, InsideGoogle
- Yahoo-Newspapers: Conference Call: Why Yahoo? Timelines, Plans, Etc., PaidContent.org
- SalaryScout: Simple, Social Salary Comparison, TechCrunch
- Where 2.0 CFP is Now Open, O’Reilly Radar
- The race to create a ‘smart’ Google, Fortune
- New Paper: Categorizing Web Search Results into Meaningful and Stable Categories using Fast-Feature techniques, ResourceShelf
- More Features for Google Page Creator, Google Blogoscoped
- PayPal 2 Google: We Can Throw TWICE As Much Money Down The Toilet!, InsideGoogle
- BlueOrganizer 3.0: Instant Vertical Search and Tagging, TechCrunch
- Top 11 Euphemisms for Cloaking, Stuntdubl
- Netscape: the Calacanis effect , Valleywag
- Is Google Flubbing Mobile Search?, GigaOm
- SES Chicago In House ONLY Meeting Tuesday at 7:15PM – SEMPO Sponsored, Search Engine Watch Forums
- SEMPO Institute To Be Launched at Search Engine Strategies Conference, Search Engine Guide
- Worldmapper Visualizations, O’Reilly Radar
- Microsoft Banning Sites from Live.com For Link Exchanges, Search Engine Roundtable
- Sullivan Pubcon Keynote, Part 2, Traffick
- Chris Tolles’ Definitive Pubcon Summary, Traffic
- PubCon Vegas 2006. Conference WrapUp, Search Engine Guide
- See All The Nukes In Google Earth, InsideGoogle
- In Search of Authentic, Trackable, Crunchable, Buzz, ResourceShelf
- SingleFeed – Free Google Base/Froogle Submission, ComparisonEngines.com
- Mobile Versions for the Major Search Engines, Google Operating System
- Sue Decker’s rising profile, Valleywag
- Yes, it’s true… I’m leaving AOL, Jason Calacanis
- Simplicity and power, (AKA, Google Page Creator gets new features), Official Google Blog
- More Navigators at Netscape (or What our paid bookmarkers are really doing. Hint: it’s not just bookmarking), Jason Calacanis
- Why Nick Douglas Left Valleywag, Google Blogoscoped
- Google Women’s Power, Google Blogoscoped
- Google News versus Digg, Hitwise