For your stats folder.
A post on the Topix.net blog lets us know that the amount of RSS content from news organizations is not as great as some might believe.
Rich Skrenta reports that of the 7000+ news sources Topix crawls only 7% have feeds. He goes on to say that even if the site has a feed, Topix usually crawls the HTML content.
"Even for sites which offer feeds, we'll generally continue to crawl the human-readable version. We've seen sites where the RSS broke but no one at the paper seemed to notice, or cases where the RSS was out of sync with the human-viewable web content."
What about search tools that focus on weblog content?
It contained the following numbers:
+ Only 63% of the weblogs Waypath crawls have feeds; only 22% have full-text in their feeds.
These Waypath numbers were a bit surprising to me. I was thinking that the penetration of RSS/XML feeds in the blogosphere was greater especially when it comes to blogs offering full text feeds.
>From the searcher perspective it's worth remembering that an RSS search might not be the same thing as a full text search.
Thanks to G.L. for the news tip.
The Original Search Marketing Event is Back!
SES Denver (Oct 16) offers an intense day of learning all the critical aspects of search engine optimization (SEO) and paid search advertising (PPC). The mission of SES remains the same as it did from the start - to help you master being found on search engines. Early Bird rates available through Sept 12. Register today!