Google Restores Usenet Archive

Heads-up all you Usenet fans: Google has introduced an advanced search capability that allows you to search more than six years of Usenet postings. And for those of you who aren't familiar with Usenet, you should take the time to explore Google's new gateway into another vast part of the Internet that's completely different than the web.

As much as it seems like the web's been around forever, it's actually one of the newest Internet technologies, little more than eleven years old. Before the web, and before email became commonplace, there were many thriving online communities that used electronic "bulletin board" systems. Usenet (from "Users' Network) began in 1979 as an experiment to integrate many of these bulletin boards (called "newsgroups") into a single system.

Usenet was one of the first distributed network systems to appear on the Internet. Unlike web servers, which only provide information on request, Usenet relied on a system of servers that communicated with each other, updating one another with new "news" (messages posted by users) as it was created, assuring that the entire system was kept up to date.

Through the mid-90s, computing resources were expensive and often scarce. As a result, Usenet messages were programmed to "expire" after a certain amount of time to free up valuable storage space. Until DejaNews (later renamed began deliberately archiving Usenet messages in 1995, there was no formal, large-scale Usenet archive.

Today, Usenet has grown to tens of thousands of newsgroups, each with its own unique topic and set of loyal users. Although the web offers similar online discussion groups or forums, the simple text-only, barebones functionality of the Usenet attracts thousands of users who actively post messages and reply to others.

The most common way to participate on the Usenet is with client software called a newsreader that's somewhat like a cross between a web browser and an email client. As an alternative to newsreaders, several companies have offered access to the Usenet via web browser, but almost all have failed. By the year 2000, Deja provided one of the few web-based gateways to the Usenet.

Then, in May 2000, Deja took its archive offline, citing operating costs and a renewed corporate focus on its comparison shopping service. Like many other Internet firms, the company was not able to survive the bursting of the so-called dot-com "bubble."

Last February, Google purchased Deja's Usenet archive, rechristening it as Google Groups. Initially, users could search only the most recent six months of this archive. Now the entire Deja Usenet archive, which dates back to March 29, 1995, is searchable. In addition, Google has expanded the size of the Deja database with an archive of its own.

"This archive is the largest such storehouse of postings on the web and contains more than 650 million individual messages," according to Google spokesperson David Krane. "The maximum size of the Deja archive (when it was still in their hands) was approximately 500 million messages. Google has since combined these 500 million messages with an archive of our own that we have been building since August 2000, and some additional content that has been donated to Google by members of the Usenet community -- small archives of messages stored in CD-ROM," said Krane. "Much of this information has been unavailable for years and constitutes the largest collection of searchable Usenet data on the Internet."

Unlike Deja's web-based interface to the Usenet, which was cluttered and confusingly co-mingled with Deja's own shopping and "community" features, the Google Groups search interface is clean and straightforward. The home page offers a simple search form, and links to the top levels of each of the ten major Usenet categories.

If you're unfamiliar with Usenet, browsing these categories is a good way to familiarize yourself with the sheer breadth of subjects available in newsgroups.

If you have a pretty good idea of what you're interested in, however, a keyword search can help you quickly locate a newsgroup that meets your requirements. Results are sorted by relevance, and include a list of relevant groups at the top of the page, followed by a list of relevant messages that match your query terms.

Each individual result displays the subject of a message, and a brief description that's generated on the fly by extracting the text from the message that includes your query words. Also included is a clickable link to the name of the newsgroup where the message was posted, it posting date and author, and for messages with replies, a link to view the entire "thread" (all replies to the original message).

There's also a link to re-sort results by date, a helpful feature if you're looking for timely information. Interestingly, when you resort by date the list of relevant groups often changes, sometimes in unexpected ways.

For instance, the query "search engines" suggests the Usenet categories, alt.html, and alt.www.webmaster when sorted by relevance, obviously all relevant. The same query sorted by date, however, suggests and alt.religion.bahai as relevant groups!

Chalk up those goofy results to the service still being in beta. "We can't disclose too much about how our ranking functions work," said Google's Krane. "But the algorithms we use to calculate relevance for Usenet are a modified version of the relevance rankings we use for the web. In the case of sorting by date, the 'top' articles are often not nearly as relevant... and so the corresponding groups we show can look pretty random," Krane said, noting that Google is continuing to work on improving relevance, and is examining many potential techniques for doing so.

If you're not satisfied with results from a simple search, Google offers an advanced search capability, where you can limit the search by a number of parameters, including date range, language, message ID, author, subject, or newsgroup. You can also do limited Boolean searching using forms that match all or any words in your query, messages without your query words, or those that include the exact phrase.

For now, you can read but not reply to Usenet messages using Google Groups. Google's Usenet engineering team is developing technology that will enable users to post messages to specific newsgroups. They aim to release this feature by mid-May 2001.

The Usenet is a huge, intriguing alternative to the world wide web. Though it's riddled with noise and garbage, there's also a lot of first-class information available if you know where to find it. With its expanded search of more than 6 years of Usenet archives, Google has made it significantly easier to find those precious nuggets of gold hidden among the vast amounts of dross on what's sometimes called the "alternet."

Google Groups

Google Acquires Deja Newsgroup Service
Although Google saved the Usenet archives from almost certain oblivion, many loyal users were upset with Google's handling of the transition to its new home.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

That's it for this issue. Thanks again for subscribing, and watch for tomorrow's issue covering Google's new advanced search service for its Google Groups Usenet service.

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was's Web Search Guide.