Newsblaster is a great new tool for news junkies, and also points the way toward some seriously cool automated web harvesting technologies that will be a boon to searchers.
Online news has gone through several iterations since the first media web sites appeared in the mid 1990s. During the "portalization" craze news was one of the first features bolted on to the major search engines and directories. Later, headline aggregators became popular, providing tailored newsfeeds to anyone with a web site.
And of course, the whole weblog phenomenon started as a source of "alternate" news on the web, with the first bloggers "editing" the web with links to news with a distinct point of view and often annotated with opinionated commentary.
Newsblaster, a project developed by the Columbia NLP (natural language processing) Group, represents the next level of evolution for news on the web. The service monitors seventeen major web news services, and groups related stories together for easy access.
What's so special about that? Isn't that what other news aggregators do?
Yes and no. What makes Newsblaster different is that it "reads" the news, using natural language and artificial intelligence techniques, and then actually writes short summaries of each major news event based on what it has "understood." And it's remarkably good at what it does.
Here's how Newsblaster summarized a recent U.S. Supreme Court review of copyright:
'Limitless' Copyright Case Faces High Court Review
"The U.S. Supreme Court agreed Tuesday to hear a case
that could determine when hundreds of thousands of
books, songs and movies will become freely available
over the Internet or in digital libraries. A nonprofit
Internet publisher and other plaintiffs argue that
Congress sided too heavily with writers and other
creators when it passed a law in 1998 that
retroactively extended copyright protection by 20
years. On Tuesday, the U.S. Supreme Court announced it
would hear a challenge to the 1998 Copyright Term
Extension Act, in which Congress extended the term of
existing and future copyrights by 20 years. Billions of
dollars and the future earning power of some of the
nation's most cherished cultural icons are at stake
as the U.S. Supreme Court considers a constitutional
challenge to a 1998 copyright extension law, legal
experts said Wednesday."
Beneath this summary, Newsblaster includes links to the news stories it has read to generate the summary.
While Newsblaster is an excellent tool for gleaning a quick summary of the most important news stories of the day, it won't replace journalists or editors any time soon. As good as the NLP techniques are at extracting and synthesizing information from news, the program lacks the perspective and critical mindset of a professional journalist -- at least for now.
And though Newsblaster uses credible news sources, it can't yet account for bias or inaccurate reporting.
But these are just quibbles, given the time-saving utility Newsblaster offers. As the underlying technology improves and is extended, it's easy to see how this sort of approach could be used to develop customized web crawlers that you tailor to recognize your own interests and send out on autonomous search missions.
If such a system were combined with a URL monitoring service, and seeded with a taxonomy of subjects personally interesting to you, it could effectively create your own web "advisory" service, automatically building directories of promising sites annotated with high-level summaries that would spare you the time of manual searching.
Just as Newsblaster won't replace journalists, this type of hybrid crawler-agent wouldn't replace information professionals. But it would make a powerful addition to our arsenal of web search tools.
Columbia NLP's "automatic system for event tracking and summarization."
Columbia Natural Language Processing Group Projects
Descriptions and links to other projects under development at the Columbia NLP Group.
Search Method Melds Results
TRN News, January 9, 2002
Description of a system that uses Newsblaster-like techniques to summarize a set of results generated by a search engine -- another Columbia NLP Group project.
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.