When will Google, Yahoo, MSN, and Ask Jeeves start indexing RSS feeds properly? from Stephan Spencer spotted via InsideGoogle is a nice look at what happens with RSS feeds when major search engines encounter them -- or more correctly, what doesn't happen with them. File formats aren't recognized, text in the feed may not be indexed and other problems exist that make web search engines a bad place to go if you want to search just against feeds.
So what? As I'll explore, indexing RSS files isn't as important as it may seem, not for the twin goals of feed discovery and blog-based news.
Feeds Have Some Info; Pages Have More
What's a feed? Essentially, a list of URLs that point at web pages. Search engines index those web pages. In fact, they index a lot of those web pages better than some "feed" search engines.
For example, if you only index what comes to you in a feed, then you're going to miss out on a lot of information. That's because plenty of feeds only carry summaries of stories, not the full text of stories.
My LinkCounts & LinkStats From PubSub's Only Rough Picture, So Far post looks at how this has an impact on counting links. The situation is even worse when it comes to understanding content. If a feed isn't full text, then you don't have a full picture of what someone was talking about.
So why do you want a "feed" search engine? In my view, these are the two main reasons:
- Feed Discovery: You want to locate feeds of interest, so you can subscribe and say in touch with news.
- Blog-Based News: You want to quickly get the latest news from the blogosphere.
Let's deal with the feed discovery issue first. Want feeds about cars? Here's a search at Technorati. Find the feed links, I dare you. The feed links aren't shown that I can spot. Instead, you get listings of pages that have been fed via a feed. If you're savvy, you'll visit the pages and then hunt around for the actual feed location. That's sort of feed discovery, but it's not as direct as some might like.
Better. Now rather than seeing actual blog and feed posts, as with Technorati, we're getting a list of blogs and sites with feeds that generally are about about "digital cameras." IE -- you're not pointed at a particular post. You're pointed at a place that offers a feed on the topic you searched for. That's feed discovery.
How about Feedster? Again, it's a list of blog posts or pages that were in feeds. Scoot to the right of these listings, however, and you'll see an orange XML box alerting you to the idea there's a feed. So in a way, you've got feed discovery.
Yahoo has long done this. Here's a search for search engine watch. See the entry for our blog? It has this associated with it:
That's feed discovery in action at Yahoo, though it's hit and miss. Our blog feed is displayed, as is our forum feed, but the main feed for the site itself goes missing.
Certainly, such display should be consistent. But even better would be if Yahoo made it easier to find the actual feed search service it offers. Search over there for cars, and you can see the difference in getting back actual feeds that seem related to the topic. My Yahoo Feed Search & Web Search Feeds Update post looks at this in more detail.
We have excellent news search from most of the major search engines. It includes content that comes from traditional and major news players as well as small news sites from across the web. Here's a search on gm food at Google. The Food Navigator site has this article come up. It's hardly a traditional source, especially when compared to the Independent newspaper, which also has an article listed.
What the major news search engines tend not to carry are many blogs. Some are listed, but plenty aren't. Whether to integrate them as part of news search is a debate that's ongoing. Some people want the "hey, you got blogs in my news search" experience. Others want them separate.
Complicating matters more is that just putting stuff on a blog doesn't make it news, anymore than someone suddenly becomes a journalist just because they get something printed in a paper. The reality is plenty of bloggers do good journalism, plenty of journalists do bad news reporting, and the reverse and all variations you can think of!
Let's side-step that debate with the recognition that many people clearly would like to have a blog search. Blog search engines come nowhere near the popularity of major search engines, but they do generate a lot of buzz. That's no wonder. People want a sense of what's being discussed, and there's plenty of talk that goes on within blogs.
So where's the blog search with the major players? Not "where's the feed search," because that's not the same thing. There are plenty of sites with feeds that are not blogs. There are plenty of blogs that don't offer feeds. But where are the blog search services you'd have expected the major search engines to have rolled out by now?
I checked with Google on this recently, but there's nothing new I can report in terms of timing. The service has promised this would come. MSN has promised the same, but we're still waiting. More on both of those promises here: MSN's Third Portal To Gain Blogs; Where's The Blog Search?
Ask Jeeves, of course, has blog seach with Bloglines -- but it has promised better improvements to come. A9 -- not quite in the majors -- rolled out its own blog search in March that Steve Rubel found pretty killer. Actually getting to the service is pretty killer as well. Trouble finding it? The best advice is to go to A9, select the beta link, then check the Top Blogs box.
As for Yahoo, my Yahoo Feed Search & Web Search Feeds Update has them saying it's something Yahoo will consider, but better feedreading tools and management are really the priority, for now.
All The World's A Feed, And The Blogs Are Merely Players
As feeds and blogs (remember, two completely different things!) grow, search is only going to get more complex. Microsoft blogvangelist Robert Scoble has said time and again that sites without feeds are "lame," as he does today.
OK, but what happens when it's not just all the "cool kids" doing feeds but everyone doing feeds? What does feed search mean then? It means relatively nothing. It means, umm, searching the web! So banging on about search engines not indexing feeds sort of misses the point. As feeds encompass everything, the major search engines are already there.
Meanwhile, what happens when everyone is running a blog? Will blog search suddenly be so unique? Or will it be more the case that people will want "news blogs" in a news blog search, while "shopping blogs" might be in a shopping blog search and so on. Or even more likely, as search continues to go vertical, blogs of a vertical nature will be integrated within those types of results.