Managing the Firehose of Real-Time Information

RSS feeds, search alerts and other information monitoring technologies are great, but often end up providing too much of a good thing. PubSub is a “matching engine” that offers a promising new way to keep up to date while alleviating information overload.

Over the past year, subscribing to RSS feeds has become an increasingly popular way to keep up to date with favorite web sites, blogs, and other frequently changing sources of web content. But anyone who’s used an RSS aggregator has experienced at least two problems: Information overload and information underload.

The information overload problem comes from sources that update frequently. Sometimes these sources provide must-read information. However, in many cases, new information may consist of a rambling post, a post on a topic of little interest to you, or worst of all (in my opinion) a post that simply points you to another feed that you’re already tracking.

The information underload problem comes from sources that start out with frequent entries, and then gradually slow down or ultimately stop updating, save for occasional bursts of output. In both cases, the temptation to stop monitoring these sources and prune them from your feed aggregator is strong—but then you risk overlooking potentially valuable information.

Enter PubSub, a real-time alerting service that monitors millions of sources of information in real time, creating customized feeds based on customized keyword-based queries that you create. In essence, PubSub combines the real-time benefits of feed aggregation with the retrospective query capabilities offered by web and blog search engines.

Bob Wyman, CTO and co-founder of PubSub calls this “prospective searching.”

“We’re trying to take the concept of publish and subscribe and monitoring and filtering and bring that up to where it’s essentially a first class citizen in the case of the web,” said Wyman.

PubSub monitors nearly 6.5 million information sources, including more than 3.5 million “active sources” (those that both exist and have regular updates). These sources include weblogs, SEC Edgar filings, Newsgroup postings, press releases and airport delays. The company says that from this mix it extracts more than 2 million new items per day, including over 1.2 million blog postings.

PubSub provides you with easy to use tools to manage this firehose of information. Simply visit the PubSub home page, enter some keywords you want to track, and click the “start tracking now” button. You’ll then see a confirmation message that your customized tracking subscription has been created, along with three options for viewing and monitoring your new feed.

The easiest option is to simply click the appropriate link or icon to add the subscription to your current news aggregator, such as MyYahoo, Bloglines, or clients that use RSS or Atom formats. If you don’t currently use a feed aggregator program, you can view your feed in a web browser by clicking the URL created for your subscription. Then just bookmark the URL and reload the page whenever you want an update.

PubSub also offers a free Sidebar utility for both Internet Explorer and Firefox that lets you monitor your subscriptions any time the sidebar is open.

Once you’ve created your own customized feeds, you’ll probably find you need to tweak them, just as you modify complex search queries to fine-rune results. PubSub provides a free account management tool that requires only a valid email address for registration.

The account management tool lets you add, change or delete keywords, or build queries using full Boolean logic. Checkboxes allow you to include “high volume weblogs and personal journals” or limit results to leading sites as measured by “link ranks,” which measures the number of recent links a particular site has received. Here’s a list of the current link rank leaders according to PubSub.

You can also create “focused subscriptions” for any of the categories mentioned above by clicking the appropriate link on the home page.

A word of caution: Avoid common words unless you want an explosion of content returned in your feeds. I dashed off a feed to monitor search engines using the names of the most popular, but virtually none of the results had anything to do with information about Google, Yahoo et al, but rather noted that the blogger had used the search engine, or included a URL (e.g.—in short, irrelevant results. Hopefully PubSub will develop its query refinement tools to help avoid these kinds of false hits.

Wyman says that the company plans to gradually increase the number and type of information sources monitored by PubSub. “We’re trying to expand the problem of search beyond the simple questions of searching text,” he said. Notably, the company isn’t going to compete head-to-head with the major search engines. Instead, Wyman says that PubSub is looking at ways to surface invisible web content via PubSub subscriptions.

To do this, Wyman is talking with a number of content providers about licensing arrangements that would allow them to run the PubSub technology behind the scenes while maintaining their own “branded skins.” Additionally PubSub provides sample applications and protocols for those who want to develop new PubSub applications or add PubSub functionality to existing client programs.

The company is also exploring ways to monetize RSS feeds with advertising, though Wyman says they’re still mostly watching others “make the mistakes” as this relatively new form of search marketing begins to evolve.

PubSub is an interesting new approach to monitoring the ever-increasing flood of new information via automated tools. The site’s simple but elegant design nicely masks some serious horsepower for managing information overload, and makes it quite accessible and easy to use. PubSub is a great enhancement to your current feed aggregator tool, but also serves as a useful starting point for monitoring frequently changing information if you’ve not yet taken the plunge into the world of feed aggregation.

Want to discuss or comment on this story? Join the Feed Aggregators: Which Do you Use? discussion in the Search Engine Watch forums.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

Will Google Ban Affiliate Bidding?
Traffick Nov 17 2004 2:32PM GMT
Can Flash Gain Search Engine Respect?
eWeek Nov 17 2004 1:52PM GMT
Google seems more like Microsoft than Microsoft
Financial Post sub Nov 17 2004 11:20AM GMT
Overture Testing Ads Rolled Into RSS
Searchblog Nov 17 2004 6:20AM GMT
Yahoo Asia Pacific
iMedia Connection Nov 17 2004 4:33AM GMT
Some Bookmarklets for the New MSN Service
ResearchBuzz Nov 17 2004 1:39AM GMT
SearchTHIS: The New and Improved MSN Search
iMedia Connection Nov 17 2004 1:23AM GMT
Ingenio Inks Pay-Per-Call Partnership with Go2
ClickZ Today Nov 17 2004 1:19AM GMT
Come Hack on RSS at Yahoo. Get Paid
Jeremy Zawodny’s blog Nov 17 2004 1:09AM GMT
New Personalized Features at RocketNews
Search Engine Watch Nov 17 2004 0:56AM GMT
Project would make 30 million old newspaper pages searchable online Nov 17 2004 0:40AM GMT
AOL: Netscape’s Not Dead Yet
eWeek Nov 16 2004 11:56PM GMT
Mobissimo Officially Launches Travel Search Engine
Reuters Nov 16 2004 10:07PM GMT
The Professor and the Adman
ClickZ Today Nov 15 2004 10:01PM GMT

Related reading

mojeek: alternative to google
youtube and child safety: is the service doing enough?
Google / YouTube and brand safety: What's next?
lessons learned from launching 100+ campaigns