Newsblaster: An Automatic Weblogger

Newsblaster is a great new tool for news junkies, and also points the way toward some seriously cool automated web harvesting technologies that will be a boon to searchers.

Online news has gone through several iterations since the first media web sites appeared in the mid 1990s. During the "portalization" craze news was one of the first features bolted on to the major search engines and directories. Later, headline aggregators became popular, providing tailored newsfeeds to anyone with a web site.

And of course, the whole weblog phenomenon started as a source of "alternate" news on the web, with the first bloggers "editing" the web with links to news with a distinct point of view and often annotated with opinionated commentary.

Newsblaster, a project developed by the Columbia NLP (natural language processing) Group, represents the next level of evolution for news on the web. The service monitors seventeen major web news services, and groups related stories together for easy access.

What's so special about that? Isn't that what other news aggregators do?

Yes and no. What makes Newsblaster different is that it "reads" the news, using natural language and artificial intelligence techniques, and then actually writes short summaries of each major news event based on what it has "understood." And it's remarkably good at what it does.

Here's how Newsblaster summarized a recent U.S. Supreme Court review of copyright:

'Limitless' Copyright Case Faces High Court Review

"The U.S. Supreme Court agreed Tuesday to hear a case
that could determine when hundreds of thousands of
books, songs and movies will become freely available
over the Internet or in digital libraries. A nonprofit
Internet publisher and other plaintiffs argue that
Congress sided too heavily with writers and other
creators when it passed a law in 1998 that
retroactively extended copyright protection by 20
years. On Tuesday, the U.S. Supreme Court announced it
would hear a challenge to the 1998 Copyright Term
Extension Act, in which Congress extended the term of
existing and future copyrights by 20 years. Billions of
dollars and the future earning power of some of the
nation's most cherished cultural icons are at stake
as the U.S. Supreme Court considers a constitutional
challenge to a 1998 copyright extension law, legal
experts said Wednesday."

Beneath this summary, Newsblaster includes links to the news stories it has read to generate the summary.

While Newsblaster is an excellent tool for gleaning a quick summary of the most important news stories of the day, it won't replace journalists or editors any time soon. As good as the NLP techniques are at extracting and synthesizing information from news, the program lacks the perspective and critical mindset of a professional journalist -- at least for now.

And though Newsblaster uses credible news sources, it can't yet account for bias or inaccurate reporting.

But these are just quibbles, given the time-saving utility Newsblaster offers. As the underlying technology improves and is extended, it's easy to see how this sort of approach could be used to develop customized web crawlers that you tailor to recognize your own interests and send out on autonomous search missions.

If such a system were combined with a URL monitoring service, and seeded with a taxonomy of subjects personally interesting to you, it could effectively create your own web "advisory" service, automatically building directories of promising sites annotated with high-level summaries that would spare you the time of manual searching.

Just as Newsblaster won't replace journalists, this type of hybrid crawler-agent wouldn't replace information professionals. But it would make a powerful addition to our arsenal of web search tools.

Columbia NLP's "automatic system for event tracking and summarization."

Columbia Natural Language Processing Group Projects
Descriptions and links to other projects under development at the Columbia NLP Group.

Search Method Melds Results
TRN News, January 9, 2002
Description of a system that uses Newsblaster-like techniques to summarize a set of results generated by a search engine -- another Columbia NLP Group project.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

Tech latest
Web group in copyright barney... Mar 6 2002 12:10PM GMT
Top internet stories
Kazaa benefits from Morpheus shutdown...
ZDNet Mar 6 2002 9:48AM GMT
Online search engines news
More Google Phonebook Fun...
Research Buzz Mar 6 2002 7:48AM GMT
Internet: international news
Art on the internet...
The Prague Tribune Mar 6 2002 7:08AM GMT
Online marketing news
Large-format Web ads in vogue...
Economictimes Mar 6 2002 6:52AM GMT
Gator Wants to Spearhead Standards...
Internet News Mar 5 2002 10:18PM GMT
Online portals news
Yahoo Groups back online...
Interactive Week Mar 5 2002 7:42PM GMT
Online search engines news
CIA details found on Google... Mar 5 2002 5:07PM GMT
Online portals news
Hardware glitch knocks Yahoo Groups offline... Mar 5 2002 12:57PM GMT
Internet: international news
Germany gets internet name back... Mar 5 2002 12:37PM GMT
Online portals news
Terra Lycos and Jazztel release financial figures... Mar 5 2002 12:37PM GMT
Online content news
Abobe opens Web services publishing door... Mar 5 2002 11:59AM GMT
Online legal issues news
The Internet is outside US law, say lawyers...
ZDNet Mar 5 2002 10:15AM GMT
powered by

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was's Web Search Guide.