Google News Search Leaps Ahead

Google has dramatically enhanced its news search service, serving up a portal of real-time news drawn from more than 4,000 sources worldwide.

Until recently, Google's news search has been competent, but less useful than other news-aggregating services such as AllTheWeb's News Search and Yahoo's Full coverage. The new enhancements establish Google as one of the premier news finding and filtering destinations on the web.

Like Yahoo's Full Coverage, Google News Search now looks like a portal, with links to the top headlines organized into categories such as Top Stories, World, Business, Sports and so on. Each category has its area on the News Search home page, with headlines, descriptions and links for the top two or three stories.

"The page looks very different than the average Google page," said Marissa Mayer, Google product manager. That's because it's packed with headlines, descriptions, thumbnail photos and dozens of links to the sources of the articles online.

Unlike Yahoo Full Coverage, however, Google News Search isn't assembled by human editors who select and format the news. Google's process is fully automated. News stories are chosen and the page is updated without human intervention. Google crawls news sources constantly, and uses real-time ranking algorithms to determine which stories are the most important at the moment -- in theory highlighting the sources with the "best" coverage of news events.

Each top story is presented with a headline linked directly to the source. Beneath the headline is a short description, name of the source, and the time when the article was last crawled, ranging from a few minutes to several hours ago.

Beneath the main headline and description are two full headlines from other sources, followed by four or five links to stories with only the name of the publication indicated. Finally, there are links to "related" stories from other sources.

This design makes it easy to quickly scan the headlines while having the option of reading multiple accounts of a story from different news sources -- from literally thousands of sources, for some stories.

Each major category has a link at the top of its respective section that allows you to scan news just within the category. Tabs on the upper left of each page also allow you focus in on Top Stories, World, U.S., Business, Sci/Tech, Sports, Entertainment and Health categories.

Unlike many news aggregators that simply "scrape" headlines and links from news sites, Google's news crawler indexes the full text of articles. This approach offers several unique benefits.

For example, full text indexing allows true searching, rather than just browsing of headlines. Creating a full text index of news also allows Google to cluster related news stories, around what Mayer calls a "centroid" of keywords. "A cluster is defined by a centroid of keywords, and all the articles have some of those key words in them," she said.

The process uses artificial intelligence in addition to traditional information retrieval techniques to match keywords with stories. Mayer says this approach to identifying related articles means that the relative importance of each article is "baked in," which is how the top sources for each story are selected.

Other factors used in calculating the relevance of top and related stories include how recently articles were published, and the reputation of the source. When you actually do a search, these factors are also applied in addition to keyword analysis to determine how closely particular stories match your query.

On search results pages, a link allows you to override the default ranking by relevance and order results by date -- a feature that's particularly helpful for monitoring breaking news.

Google's decision to index the full text of news sources rather than simply scraping headlines posed a major challenge for implementing the new service. The vast diversity and typically cluttered design of most online news formats is more difficult to crawl and index than many other types of web sites. "Article extraction has proven to be one of the most difficult aspects of the project," said Mayer.

Google crawls its 4,000 sources of news continuously and in real time. According to Mayer, the crawler continuously computes what's likely to change on each news source, and when the change is likely to occur. To expedite the discovery of new stories, the crawler tends to hit hub or major section pages frequently, to see what new links are there.

While the news sources are crawled constantly and individual news stories are updated continuously, the entire set of displayed stories is "auto generated" every 15 minutes. A message in the upper right corner of the main news page indicates when it was last generated.

Google's updated news search is an exceptionally powerful tool for web users. It's still in beta, so there are still a few rough edges, but all told it's one of the best news browse and search portals currently operating on the web.

Google News Search

News Search Engines
If you are still looking for news using "normal" search engines, stop doing it! You'll find the services listed on this page to be a much better way to search for the latest news stories from hundreds of sources on the web.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

Online portals news
Netscape loses privacy dispute...
ZDNet Oct 2 2002 9:52AM GMT
Top internet stories
Reality Check for Web Design...
Wired News Oct 2 2002 9:52AM GMT
Online portals news
Yahoo to Offer Rich Media Microsites... Oct 2 2002 9:04AM GMT
PluggedIn: Web Portals Gear Up for Autumn Battle...
Yahoo Oct 2 2002 0:26AM GMT
Online search engines news
AltaVista and URL Inclusion...
Research Buzz Oct 2 2002 0:18AM GMT
Internet: international news
Internet, mobile phones take off in Africa...
Daily Mail & Guardian Oct 1 2002 10:59PM GMT
Online search engines news
Is AltaVista searching for top dollar?...
CNET Oct 1 2002 9:22PM GMT
Online portals news
Yahoo to run multimedia ads...
CNET Oct 1 2002 12:42PM GMT
Online search engines news
Reader Question -- Search Engine for A CD...
Research Buzz Oct 1 2002 12:02PM GMT
Top internet stories
What the net is doing to you...
BBC Oct 1 2002 7:31AM GMT
Online marketing news
CNET Moves To Paid Listings On
Technology Marketing Oct 1 2002 6:57AM GMT
Domain name news
ICANN and ccTLDs in technical dispute...
Demys Oct 1 2002 6:51AM GMT
Online portals news
Lycos Drops $18 Million Into Sports Promotion...
Advertising Age Oct 1 2002 1:29AM GMT
powered by

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was's Web Search Guide.