How Search Engines Index RSS & Why It Doesn’t Necessarily Matter

When will Google, Yahoo, MSN, and
Ask Jeeves start indexing RSS feeds properly?
from Stephan Spencer spotted
via InsideGoogle is a nice look at what
happens with RSS feeds when major search engines encounter them — or more correctly, what doesn’t happen with them. File formats aren’t recognized, text in the feed may not
be indexed and other problems exist that make web search engines a bad place to go if you want to search just against feeds.

So what? As I’ll explore, indexing RSS files isn’t as important as it may seem, not for the twin goals of feed discovery and blog-based news.

Feeds Have Some Info; Pages Have More

What’s a feed? Essentially, a list of URLs that point at web pages. Search engines index those web pages. In fact, they index a lot of those web pages better than some
“feed” search engines.

For example, if you only index what comes to you in a feed, then you’re going to miss out on a lot of information. That’s because plenty of feeds only carry summaries of
stories, not the full text of stories.

My LinkCounts & LinkStats From PubSub’s Only Rough Picture, So Far post looks at how this has an impact
on counting links. The situation is even worse when it comes to understanding content. If a feed isn’t full text, then you don’t have a full picture of what someone was
talking about.

So why do you want a “feed” search engine? In my view, these are the two main reasons:

  • Feed Discovery: You want to locate feeds of interest, so you can subscribe and say in touch with news.
  • Blog-Based News: You want to quickly get the latest news from the blogosphere.

Feed Discovery

Let’s deal with the feed discovery issue first. Want feeds about cars? Here’s a search at Technorati. Find the feed links, I
dare you. The feed links aren’t shown that I can spot. Instead, you get listings of pages that have been fed via a feed. If you’re savvy, you’ll visit the pages and then hunt
around for the actual feed location. That’s sort of feed discovery, but it’s not as direct as some might like.

Here’s the search at Bloglines. Apparently, the world of car-related feeds is dominated by Craiglist, because
that’s practically all you get on the first page. OK, let’s try digital cameras.

Better. Now rather than seeing actual blog and feed posts, as with Technorati, we’re getting a list of blogs and sites with feeds that generally are about about “digital
cameras.” IE — you’re not pointed at a particular post. You’re pointed at a place that offers a feed on the topic you searched for. That’s feed discovery.

How about Feedster? Again, it’s a list of blog posts or pages that were in feeds. Scoot to
the right of these listings, however, and you’ll see an orange XML box alerting you to the idea there’s a feed. So in a way, you’ve got feed discovery.

Yahoo has long done this. Here’s a search for

search engine watch
. See the entry for our blog? It has this associated with it:


View as XML

Add to My Yahoo!

That’s feed discovery in action at Yahoo, though it’s hit and miss. Our blog feed is displayed, as is our forum feed, but the main feed for the site itself goes missing.

Certainly, such display should be consistent. But even better would be if Yahoo made it easier to find the actual feed search
it offers. Search over there for cars, and you can
see the difference in getting back actual feeds that seem related to the topic. My Yahoo Feed Search & Web
Search Feeds Update
post looks at this in more detail.

Blog-Based News

We have excellent news search from most of the major search engines. It includes content that comes from traditional and major news players as well as small news sites from
across the web. Here’s a search on gm food at Google. The Food Navigator site
has this article come up. It’s hardly a traditional source, especially when compared to
the Independent newspaper, which also has an article listed.

What the major news search engines tend not to carry are many blogs. Some are listed, but plenty aren’t. Whether to integrate them as part of news search is a debate that’s
ongoing. Some people want the “hey, you got blogs in my news search” experience. Others want them separate.

Complicating matters more is that just putting stuff on a blog doesn’t make it news, anymore than someone suddenly becomes a journalist just because they get something
printed in a paper. The reality is plenty of bloggers do good journalism, plenty of journalists do bad news reporting, and the reverse and all variations you can think of!

Let’s side-step that debate with the recognition that many people clearly would like to have a blog search. Blog search engines come nowhere near the popularity of major
search engines, but they do generate a lot of buzz. That’s no wonder. People want a sense of what’s being discussed, and there’s plenty of talk that goes on within blogs.

So where’s the blog search with the major players? Not “where’s the feed search,” because that’s not the same thing. There are plenty of sites with feeds that are not blogs.
There are plenty of blogs that don’t offer feeds. But where are the blog search services you’d have expected the major search engines to have rolled out by now?

I checked with Google on this recently, but there’s nothing new I can report in terms of timing. The service has promised this would come. MSN has promised the same, but
we’re still waiting. More on both of those promises here: MSN’s Third Portal To Gain Blogs; Where’s The Blog

Ask Jeeves, of course, has blog seach with Bloglines — but it has promised better improvements to come. A9 —
not quite in the majors — rolled out its own blog search in March that Steve Rubel
found pretty killer. Actually getting to the service is pretty killer as well. Trouble finding it?
The best advice is to go to A9, select the beta link, then check the Top Blogs box.

As for Yahoo, my Yahoo Feed Search & Web Search Feeds Update has them saying it’s something Yahoo will
consider, but better feedreading tools and management are really the priority, for now.

All The World’s A Feed, And The Blogs Are Merely Players

As feeds and blogs (remember, two completely different things!) grow, search is only going to get more complex. Microsoft blogvangelist Robert Scoble has said time and
again that sites without feeds are “lame,” as he does today.

OK, but what happens when it’s not just all the “cool kids” doing feeds but everyone doing feeds? What does feed search mean then? It means relatively nothing. It means,
umm, searching the web! So banging on about search engines not indexing feeds sort of misses the point. As feeds encompass everything, the major search engines are already

Meanwhile, what happens when everyone is running a blog? Will blog search suddenly be so unique? Or will it be more the case that people will want “news blogs” in a news
blog search, while “shopping blogs” might be in a shopping blog search and so on. Or even more likely, as search continues to go vertical, blogs of a vertical nature will be
integrated within those types of results.

Related reading

search reports for ecommerce to pull now for Q4 plan
Effective Amazon PPC How to get the most out of Amazon PPC campaigns on a limited budget
A review of the payday loans algorithm in 2019