In today's au Natural column, "Will Google's FeedBurner Scorch Organic SEO?," Mark Jackson examines the impact Google's acquisition of Feedburner could have on search marketers.
Posted by Kevin Newcomb at 1:04 AM | Permalink
I had the pleasure of interviewing Rick Klau recently, and we got into a deep discussion about how to get the most out of your RSS feeds. One of the key points we discussed was the notion of including the your entire article into the feed, not just a summary.
I asked Rick about this, and he said that he is a fan of this approach. Many have debated the pros and cons of this approach, with the two sides of the argument being about the tradeoffs between providing users the ability to read your content directly in your feed, if they prefer this, versus the goal of getting users to your site.
Rick points out that there is another entire dimension to this argument that most people are missing. More and more RSS services are discovering feeds and indexing feed content based on the content of the article as presented in the feed. Including the full article in the feed itself increases exposure in these services, and therefore increases the reach of your feed and site.
One example of this is Techmeme that scans feeds to see links between posts to quickly build a map of all the discussions on a particular subject. If you publish only a summary of an article that does not include a key link, then you can miss out on participating in the traffic and exposure that a service like Techmeme can provide.
Posted by at 8:16 PM | Permalink
The Google Blog announced that Google Blog Search now supports the acceptance of pings. So when you add a new blog entry and you support RSS/XML/Atom you can send Google Blog Search a ping at their Blog Search Pinging Service. How do you do this? Well, you can read more at About Google Blog Search Pinging Service and view the Pinging API yourself. I tested it out on a different blog and got a 404, but it is very possible I pinged the wrong URL, I will test it again shortly.
Postscript From Danny:
Movable Type users want to put this: http://blogsearch.google.com/ping/RPC2 into the "Others" box in the "Notify the following sites upon weblog updates" section of "New Entry Default Settings."
WordPress users, go to Options, Writing Options, the paste that line into your Update Services box.
Posted by Barry Schwartz at 12:45 PM | Permalink
Matt Cutts has a nice illustrated survey of how various major search engines deal with the meta noindex tag in Handling noindex meta tags. He finds inconsistency, with this being the summary:
Interestingly, if you use a robots.txt file to ban indexing, in that case Google DOES show the page in some ways. Matt acknowledges this, but it still raises the question why Google operates differently when the intent of both mechanisms (explained here) is the same. I've commented in his blog on the issue as follows:
Why would Google want to treat meta noindex and robots.txt differently. They are both intended to do the same thing — keep pages out of an index. The only reason we have two options is simply because some people can't setup robots.txt files for their sites, which might be within the domains of others. However technically they are implemented, it seems like they should be treated the same way.
My gut tells me most webmasters would prefer that all the search engines not list any pages that use either a robots.txt or meta noindex command.
From a user perspective, I think the technique of showing a link to a site if you can learn about it another way is fine, such as being listed in the Open Directory or from links on the public web to those sites.
The Yahoo implementation of meta noindex is odd — why show a cached page. But I can see a hole here. They might not be actually indexing the page but still caching is since the specific noarchive tag isn't also being used:
Sounds like summit time! Not only would a standard on how meta robots and robots.txt be handy, but it would also be nice to know if blocking a page also inherently blocks caching.
A summit -- or consistent standards, is something the first person commenting on Matt's blog is calling for. If it happens, perhaps it could also be extended to feeds. Ask.com & Bloglines Proposes Blog Search Exclusion Tag from us earlier this month covers a proposed standard from Ask.com. The robots are coming! The robots are coming! over at SEOmoz gives some brief examples of why this might be useful.
Matt's blog already has a good discussion going on this topic, so if you have thoughts and ideas, add more over there.
Posted by Danny Sullivan at 9:39 AM | Permalink
I spent 3.5 hours Saturday night creating and lightly optimizing a completely new data feed (or bulk upload) to get PersonalProtectionStore's home security products up and running on Google Base. PersonalProtectionStore uses Yahoo! Merchant Solutions which allows me to download a catalog and associated information, but it's not in a format compatible with Google Base (superfluous fields in the catalog, non-uniform field names) and the dowload brings with it special characters and HTML code which cause listing errors. Additionally, there are many fields Google Base makes optional which my catalog did not include. These additional fields help with data feed optimization (DFO).
Froogle feed submission used to take a couple days (and a couple iterations) to process correctly, so I was pleasantly surprised to see my feed accepted in less than an hour. More importantly, by the time I checked back early the next morning, my products (and images - this used to be a major issue with Froogle) were up and running. My feed of 153 products consisted of 30 errors which weren't explained very well, but overall it was a smooth process. [It's important to note that I've submitted hundreds of feeds to multiple engines. There are plenty of people on the Google Base Help Discussion Group having problems.]
Submitting to Google Base gets you listed on Froogle. Being in Google Base or Froogle potentially gets you in the Google Onebox product search results in a regular Google search (just under the Sponsored Links and just above the organic results).
Here's one example: -Google Base result for ademco 20pi -Froogle result for ademco 20pi -Google OneBox result for ademco 20pi
While I've been critical of Froogle's Spam problem, and I'm not sure what to make of Froogle's recent update (more info soon), I did say in early April that it's time for merchants to submit their feed to Froogle (Google Base dashboard has now completely replaced the Froogle Merchant Center). My 3.5 hour experience only strengthens that argument.
For good or bad, many merchants are completely reliant on Google for sales, obsessing about optimizing their sites for better orgranic listings and feverishly spending on Google Adwords. Just don't forget about submitting your data feed to Google Base.
If you're having trouble with the your Google Base data feed, there are many companies which can help you out with feed management: ChannelAdvisor, ChannelIntelligence, Performics, Mercent, Vendio/Andale, Marketworks, RedZoneGlobal, MerchantAdvantage, etc.
Posted by Brian Smith at 3:12 PM | Permalink
FeedBurner CEO, Dick Costolo, Interviewed By Business 2.0 MagazineCNN Money has an Business 2.0 interview with Dick Costolo, FeedBurner's CEO. The article discusses the challenge FeedBurner had raising capital for such a new technology, RSS. It also discusses their recent success and future challenges (RSS ads, Google, etc.). Read the article named Redefining the RSS feed.
Posted by Barry Schwartz at 10:01 AM | Permalink
Danny mentioned in his write-up on Google Finance that it shows both Google News results and Google Blog Search results on the stock results landing page. Now Seth Finkelstein offers a few tips on perhaps enhancing your chances of showing up for a stock search in the blog posts section of Google Finance. The basics are to use the full company name in your blog title, and you have a pretty good shot of being included on the page.
For more information on Google Blog Search, see here. Plus look at Thoughts On & Poking At Google Blog Search to understand there's even a chance a blog might be in news search and blog search.
Posted by Barry Schwartz at 11:29 AM | Permalink
While a good search marketing campaign is obviously targeted first and foremost at the major search engines, other sources of traffic are becoming increasingly important. News and blog services are among these sources, and they require different strategies and tactics that many search marketers aren't yet taking advantage of. In today's SearchDay article, News Search Engine Optimization,guest writer Shari Thurow recaps a recent Search Engine Strategies panel that focused on these increasingly important sources of search traffic.
A longer version of this story for Search Engine Watch members offers specific tips on copywriting for news releases, creating unique URLs and landing pages for specific purposes and tips for crafting a powerful public relations strategy that complements your search engine optimization efforts.
Posted by Chris Sherman at 8:48 AM | Permalink
Brett Tabke from WebmasterWorld dropped me a note about a new thread where he's answering many questions about WebmasterWorld banning all spiders, while Barry over at Search Engine Roundtable also has an interview with him. In both places, you'll learn of spiders being an increasing burden to the site, though I still am very, very wary that others should follow the route that Brett's taken.
Attack of the Robots, Spiders, Crawlers.etc at WebmasterWorld picks up from the Lets try this for a month or three thread where Brett announced last week that WebmasterWorld was banning all spiders by excluding them via robots.txt and through other measures such as required logins.
WebmasterWorld Bans Spiders From Crawling and WebmasterWorld Out Of Google & MSN from the SEW Blog covers more about the move fallout with WebmasterWorld no longer being visible in two major search engines.
In his latest posts, Brett explains:
Brett Tabke Interviewed on Bot Banning from Search Engine Roundtable takes the interview approach, where it is much easier to see what Brett's thinking and reacting to than wandering through the forum posts. Beyond the points above, he addresses not wanting to make use of non-standard extensions to robots.txt that Google, MSN and some other search engines have added precisely because they aren't standard.
Overall, I can appreciate much of what Brett's going through, but there still have to be better ways for this to be addressed. His solution is simply not one that the vast majority of sites will want to try, because it will simply wipe out the valuable search traffic they gain.
To be clear, I'm NOT saying that any site should be entirely dependent on search traffic. But neither do you cut yourself off from them, either. It's a matter of balance and moderation. To quote from what I posted in our forum thread on the WebmasterWorld situation:
People would often ask how much of their traffic they get from search engines. There is no right answer, but I'd often said that if you were looking at 60, 70, 80 percent or higher, you might have a search engine dependency problem. You want to have a variety of sources sending you traffic, so no one single thing wipes you out.
But to suggest that a site is so successful that it doesn't need search traffic at all? That's foolishness. I have absolutely no doubt that WMW will survive. It's a healthy community with plenty of alternative traffic. But people seeking answers to things it has answers to give are no longer going to be finding it.
Hmm, we'll maybe those people aren't good members, just generate to noise and so on. Yeah, maybe. But that also assumes that every single quality person must be there already. That's just not so. You always have good new people coming onto the web.
Search engines are a way you build up loyal users. People often discover you for the first time through search, then they keep coming back. It's not a dependency to have a small amount of your traffic bringing in new people this way. But it is, in my view, a marketing screw-up to cut yourself off from that potential audience.
Geez, it's like the basic rule of SEO/SEM. Ensure your site is accessible to search engines. If they can't get in, you stand no chance of getting traffic at all from them. And when people are paying by the click for search traffic, why don't you want that free publicity. Why wouldn't you seek other ways of retaining it but also restricting the bad bandwidth you don't want?
Overall, WMW obviously can and will do what it wants, and perhaps there's some magical master plan that down the line will make us all say "Genius!" Maybe. But this is a very, very bad model for any site to be considering, if they're having the same spidering problems that are the stated reason for why WMW is doing this. It's like saying you're getting too many phone calls to your business, so you're going to pull out the phone entirely!
So what is a site owner to do, if they are suffering from rough spiders? I'll share a bit from our own experience, plus point at what maybe the search engines should be doing.
We've encountered rogue spiders. It was one reason why our own Search Engine Watch Forums were down briefly last month, coincidentally the same time WebmasterWorld and Threadwatch went offline for different reasons. Rogue spiders aren't just something unique to Brett's set-up. They can and do indeed cause problems even for less "flat file" sites and URL structures. In fact, want to have some fun. Check this out. That shows you all the people on our SEW Forums at the moment you click on the link, up to 200 visitors. Scan the IP Address column, and you'll see how Yahoo's Slurp spider is in many, many different threads all at once. That's a burden on our server, though since we're getting well indexed as a result, it's a burden we live with.
Our own solution has been for our developers to throttle or ban spiders at the IP level that seem to be hitting us hard, in particular spiders that aren't identifying themselves as to their purpose. Good spiders often leave behind a URL string in your logs so you know they are from Google, Yahoo or whatever. For example, Yahoo points you here. Google points you here. No good identification? Then we don't worry that banning you is going to harm us seriously in some way.
What about improving the robots.txt system? Unfortunately, that's not a solution for rogue spiders. Brett's right when he points out the real story is moving to required logins. Rogue spiders aren't paying attention to robots.txt. Put in a ban against them, and they'll ignore it. Robots.txt only works with "polite" spiders.
Because robots.txt isn't a solution, it also means that wishing that the major search engines would come together to endorse new improved "standards" for the protocol also isn't a solution. Since rogue spiders are ignoring robots.txt, it doesn't then matter for there to be some type of universal agreement to have a "crawl delay" feature or more wildcard support, for example.
Still, while improving robots.txt isn't a solution to rogue spiders, there are things it could do if improved, and I'm right with Brett in wishing that the major search engines wouldn't unilaterally make their own improvements, as I've written before (and here).
So if we can't depend on robots.txt, what is the solution? If more and more sites face heavy spidering, we'll likely have to see a shift toward feeding content to search engines.
Feeding content isn't a new idea. Yahoo's paid inclusion program is pretty well known as a way for site owners to feed not URLs into the search engine but actual page content. Yahoo also has partnerships with some sites to take in content on a non-paid basis. Google also takes in feeds of content through things like Google Scholar or Google's Froogle shopping feeds program.
To be absolutely clear, these types of program aren't situations where you feed URLs, as with Google Sitemaps or Yahoo's bulk submit. These are programs where you feed actual page content. The spider doesn't come to you and hunt and guess at what you've got. You tell the spider what you've got.
Expanding feed programs to everyone would be a much more efficient way of gathering content, with one exception. You can expect that some sites will abuse feeds to send misleading content. Heck, it's bad enough how ping servers are already abused being wide open this way, as I wrote about on Matt Mullenweg's blog last month, when the future of ping servers was raised:
Whether we have an "independent" ping service almost seems beyond the point when both Dave and Matt are talking about the ping spam problem they have experienced. I'm actually surprised any the open ping servers are surviving. If they are open to anyone to ping, a small number of people will abusively ping for marketing gains
We?ve had 10 years of history knowing this with web search. Web search engines could long ago have had instant add facilities. Indeed, Infoseek and AltaVista even did for a short period of time. They found that without barriers, a small number of people would flood them with garbage. That?s why they don?t take content in rapidly. It?s not that they aren?t smart enough to take pings or let website owners flow content in. Instead, it is that they?ve learned you can?t leave a wide door open like that without being abused.
There?s absolutely no reason for anyone to have assumed that RSS/blog/feed search services were going to be immune to the same problem. If the ping outlook is bleak, it?s not because Verisign or Yahoo has purchased some service. It?s because you simply can?t leave doors open on the web like this for search, not for any search that?s going to attract significant traffic. Blog search is gaining that traffic, and you can expect the spam problem will simply get worse and worse until some barriers are put into place. You also cannot expect that you?ll simply come up with some algorithmic way to stop ping spam. Again, 10 years of web search engines diligently trying to stop spam has simply found it?s a never ending arms race.
I don?t know what the solution is. I suspect that for the major search players, the Googles & Yahoos, they?ll eventually move to a combination of rapid crawling, trusted pings and open pings as a backup. Remember, they get news content very fast. If they have a set of trusted sites, they can spider and hammer those hard. They?ll know to keep checking Boing Boing, Scripting and maybe 1,000 other major blogs that really, really matter ? and that when you check them, you quickly discover other links from blogs you may want to fetch quickly.
So throwing feeds wide open to everyone without vetting isn't the solution. But certainly we're overdue for feeds to be available to more people without requiring payment, through some type of trusted mechanism.
WebmasterWorld is a perfect poster child for this. People want the content there, and the search engines should want the content to be found via their sites as well. Allowing the site to feed its content gets around the barriers erected to stop rogue spiders very nicely.
But WebmasterWorld isn't the only candidate in this class. Many others, including myself, want the ability to feed actual content to the search engines. Let's see them move ahead with a way to make this more a reality, to establish real "trusted feeds" that aren't based on payment or whether your site falls within an area that the business development teams think need more support. Google Base may become Google's means of doing this, but at the moment, that's not feeding into web search.
Want to comment or discuss? Visit our Search Engine Watch Forum thread, WebmasterWorld Off Of Google & Others Due To Banning Spiders.
Posted by Danny Sullivan at 4:16 PM | Permalink
Nick at Threadwatch discovers a new Add To Google button, while I'm also overdue to discuss the new Save To My Web button that Yahoo kindly rolled out last month. Let's jump in!
The new Add To Google button is easy to implement. Fill out the form at Google, then you get the appropriate HTML. Insert the button on your home page, then when people click, they are directed to add your feed to either their personalized edition of the Google homepage or Google Reader. Adding it to Google Sidebar, sadly, isn't a third option. That should be supported as well. Hopefully, we'll see it come.
As for the new Yahoo button, Yahoo announced it at the end of last month. In fact, I'd been asking them for one publicly, so they came back in that post and specifically called me out to say "Here it is!" But I was on vacation at the time, hence me playing catch-up!
It's very welcomed. My Web is the future of where search is heading at Yahoo, as A Search Marketer's Look At Yahoo My Web 2.0 covers in more depth. Getting your pages added and part of the trust networks that My Webbians are building over there is important. This button makes it easy to encourage that type of saving.
To get the button, there's no form to fill out. You just grab JavaScript code from here. That puts a little button on your site. When people push it, your page title will be grabbed, along with the URL and some suggested tags for saving the page under.
Nick at Threadwatch has gone a step further. He's used the code to make a link-text only version of the save to my web feature. He discusses it more here, and the code is here.
Why not just use the button? By using Nick's code, you can custom the text of the link, in case you want to give people more instructions. For example, look over in our left-hand navigation area. I've used both the button plus Nick's code underneath, altering it to stress that this is "Yahoo" My Web, something the button doesn't say.
Down the line, I want to move that type of code over to the bottom of posts, to help encourage people to save them. Having that as a textual link makes it a bit easier. And if you're going to do it for My Web, why not for bookmarking service Del.icio.us? That's easily done through this code spotted via Threadwatch.
Are all these buttons worthwhile? I still can't tell if they are driving that many sign-ups, but I've fallen into the "learn to love them" category. While having one unified sign-up system might be better, if having an Add To Google or Add To Yahoo button means I'm going to get some additional visitors who recognize what that means, I'm going for it.
Want to love buttons yourself? See Getting Add To & Subscribe Buttons For Feeds, which I've posted for Search Engine Watch members. It takes you to the forms for popular services, so that you can merrily make your own badges.
Posted by Danny Sullivan at 9:34 AM | Permalink
Material from 15,000 blog sources have been added to the Topix.net database. Topix.net already contains material from 12,000 mainstream media sources. Items from blogs and mainstream sources are mixed on topical "feed" pages and search results pages. Topix CEO, Rich Skrenta, has the details (including some great charts and stats) on the company blog.
If you've never visted and/or used Topix.net, it's more than worth a look. I use many times each day (it was one of my top new resources for 2004) either as a news search tool or by browsing some of the more than 300,000 topical "feeds" and 30,000 local feeds that are constantly updated. Btw, Topix also does a great job of separating press releases from other content (look for the PR Scan link in the left column of every page). Channels are available for every Zip Code in the U.S. (and most postal codes in Canada) as well as celebrities, industries, and much more. I find material via Topix I either don't see elsewhere or see it using Topix first. Every channel can be viewed on the Topix site or can be via RSS.
So, let's get to today's news from Topix.net about the addition of content from more than 15,000 blogs to their crawl of more than 12,000 news sources.
Highlights + Blog posts are currently highlighted in a tan/manila box to separate them from mainstream media. This is most likely a beta and will not be the final UI.
+ Topix crawls both RSS and HTML. However, Rich Skrenta tells us that it's an RSS crawl for most of the blog content.
+ "Posts should show up on our site and search index within 1-3 minutes of being crawled." Note: Our blog as well as the DocuTicker site I edit were fortunate enough to be two of about 500 blogs that have been in the Topix index prior to today. I can say that many times I was able to find something I posted in Topix within a VERY and I mean a very few minutes.
+ The Topix blog post offers a pie-chart comparing the amount of posts (by topic) from weblogs versus what Topix calls "mainstream media." Interesting. The only thing I'm unclear about what is precisely a blog and does the definition vary from blog to blog? For example, does a "blog" from the BBC, Washington Post or MSNBC count as a blog or a mainstream source? I'll admit that this is a gray area as blogs become more mainstream. Just how a blog is defined these days is very debatable.
+ The numbers. Topix.net CEO Rich Skrenta offers some insights and numbers the "real" number of blogs out there versus the amount of spam blogs that exist. Very interesting and some might say, amazing numbers that will sure have people talking. I'll leave it at that for now. Tag the following numbers: wow. (-:
While the total number of unique feeds that have ever existed, or blogging accounts that have ever been signed up can certainly be counted, what is far more relevant to us is the composition of the daily posting stream. [My emphasis] What we're seeing is that 85-90% of the daily posts hitting ping services such as weblogs.com are spam (take a look for yourself). Of well-ranked non-spam blogs that we've discovered, we've found about half haven't been updated in the past 60 days. Our filters sift through what's left, which even after discarding 95%, is still a great deal of good material.Why 15,000 Blogs? Who Made the Selections? So, how did Topix choose the 15,000 blogs that are now in the database? Skrenta explains that more than 1 million blogs were crawled and then ranked using their NewsRank algorithm that looked at blog posting frequency, writing style, type of reference, popularity, etc. We also learn that 15,000 blogs is an arbitrary number and Topix hopes to add more (lots more) moving forward.
Adding Your Blog If you're blog isn't included in the Topix crawl, you can submit your blog (and give feedback on the service) here.
This is all very new and I look forward to seeing how useful the blog content is versus what I've been finding from Topix over the past year. One feature that would be good to have is an option to toggle either blog content or mainstream media content on or off both topical pages and the advanced search interface.
More later.
See Also: An OJR interview from earlier this year with Rich Skrenta and Chris Tolles from Topix.net
Posted by Gary Price at 10:51 PM | Permalink
ResearchBuzz has a short post with news of a new mass-pinging service that Tara seems impressed with. It's called Pingoat. Now, that's a name for you. They service's slogan is "the stable of all pings." Tara concludes her comments by saying that Pingoat "seems to be quite a bit quicker than Pingomatic." Happy Pinging!
Posted by Gary Price at 1:45 PM | Permalink
How the wheel turns. Back in the 1990s, portals gave away free home pages and with them came a huge amount of search engine spam. Today, portals like Google, Yahoo, MSN and AOL give away free blog space -- and lo and behold, we have blog spam that apparently hit a new high with a blog spam emergency this weekend, as Tim Bray writes. The blogosphere has been buzzing with discussion on the problem.
As for myself, I just continue to shake my head that these type of spam issues with blogs simply weren't expected. The solution? It's likely going to be just like what happened with free web space -- free blog space will get ignored by search engines.
Come along, and we'll do a tour of past and present, plus a look at the issues you get when you try to maintain quality when also ranking search results by time.
1997: Free Home Pages & Spam
First the past. It was around 1997 when free web space for personal home pages seemed to become more accessible to many people. I remember it well, because soon after, I started getting complaints from those making use of these services. Search engines weren't finding all of their pages, they found. Some discovered that none of their pages got indexed at all.
It got so bad that eventually, I had to do an article about it. Search Engines And Free Web Pages still floats around in the SEW Archives area for our SEW members. Here's the top to it, which will sound really familiar when I start talking about blog spam later on:
Many people take advantage of free web space provided by their internet access providers. What they don't realize is that search engines have shown a tendency to miss or even ignore certain sites. Complaints have been heard from those using space provided by America Online, CompuServe and other places.
Indeed, AltaVista no longer even accepts submissions from Tripod, a popular web service that provides free space. Why? Search engine spammers were using free space there as a base of operations. It's easy to open up a new account, hit the search engine with bogus pages, then move on once the spamming attempt is detected.
At that time, it was more the internet access providers giving away space, rather than portals. But not long after, portals jumped in themselves, culminating I'd say with Yahoo's $3.6 billion acquisition of GeoCities in 1999.
Search spam hosted on free web space had died down as a problem by that point, however. Why? It was both because search engines were largely ignoring these areas of the web and because these areas were ignored, they no longer were attractive magnets to spammers. No one wants space that can't be seen.
2005: Free Blogs & Splogs
Now let's skip ahead to today. Well, more specifically to yesterday, when Mark Cuban who backs blog search engine IceRocket wrote in his Get Your Blogspot Shit Together Google post:
The blogosphere was hit by a blogspot.com splogbomb. Someone did the inevitable and wrote a script that created blog after blog and post after post.
I'm not talking 100 blogs with a 100 posts each. Im talking what could easily turn into 10s of THOUSANDS of blogs pinging out millions of posts!
Do a search for HDNet on Icerocket.com or any of the other engines and look at all the Splogs there are. And they have URLs like this So google, at least for the time being, we shut out adding new blogspot posts to our index until we clean all the bullshit you dumped on us out of our indexes.
Sound familiar? I mean, just change the names, and the result is the same. Blogs are simply more sophisticated home pages, for many, as I've written. And splogs (spam+blogs) are just more sophisticated home page spamming attempts.
Allow anyone to create content for free or with no real barriers, and surprise, a few people will go to extremes and be abusive. Result? IceRocket no longer trusts the free blog space that Google offers through Blogger/Blogspot, in the same exact way that many search engines stopped trusting the free web space of the GeoCities of the past.
Google's Failure To Police Or Post Barriers
Google: Kill Blogspot Already!!! from Chris Pirillo also went up Sunday. As with Mark Cuban, Chris finds Blogger's Blogspot-hosted blogs the chief culprit.
I don't know what's (specifically) making it so insanely easy for these spammers to get signed into your system, but you need to change that....
Suggestion, Google? As bold as this might sound, you should institute an authentication system - a captcha of sorts - for every single post that gets sent through your Blogger service. This means that there's no more easy rides for the idiots out there who are killing your baby and the blogosphere.
Fair enough -- some barriers to entry would help, either in setting up the free space (captchas, charging a token fee, whatever) in the first place or perhaps even in how people are allowed to post.
Google's certainly winning no points with me on this front. Back in June, I wrote My Encounter With Search Spam On Blogger, where I talked about someone that lifted a description from the Search Engine Watch web site in a misleading manner, the same person who had lots of other splogs going, as well. In addition to writing about it, I also went through the formal reporting channels. Nevertheless, there it sits still.
But Barriers Won't Solve Issues For Blog Owners
Of course, it won't help if only Google cleans things up. I haven't checked, so apologies if I'm mistaken, but I'm pretty sure that I can get going with free space over at MSN Spaces and Yahoo 360. Google's Blogger is simply a more well known service. Closing down abuse at Blogger would be great, but I suspect that just means the abuse will move elsewhere.
For potential bloggers, I'm afraid my advice about free home pages from back in 1997 will become just as applicable to free blogging space:
That may seem unfair [search engines ignoring free web pages], but when you use free web space, it's as if you have hundreds of roommates. They can get the entire domain in trouble, and the police, or the search engines in this case, may not care that you are innocent.
Ask your provider if there have been any problems with search engines visiting free web pages. They should know if there are complaints, and they should also be able to help resolve any problems. They have the ability to direct large numbers of people toward the search engines, so it's to the advantage of the search engines to work with the providers.
If it's crucial to be indexed, you may want to consider leaving the free web space and going with a commercial hosting service.
In other words, get your own domain name. It has never, ever, ever, ever, ever, ever been a good idea from a search marketing perspective to make use of someone else's domain name, as you are not in control of your own destiny.
I don't care whether it is Blogger, MSN Spaces, Yahoo 360, Typepad, WordPress or anyone. If you make use of someone else's domain name, you are ultimately leaving yourself open to:
Don't trust me? Don't trust this fundamental bit of advice that I and other search marketers have been saying for years, to have your own domain name? Then usability expert Jacob Nielsen just said the same thing today in Weblog Usability: The Top Ten Design Mistakes. Tip number 10 is not to use a domain name owned by another service. He talks about the controlling your own destiny issue, as well as being seen as an amateur and problems in moving over to your own domain name down the line.
Splogs & Searching Issues
How about the searcher side of things? Tara Calishain found Google and Feedster most impacted by splog, Technorati seeming more resistant (probably in part, I suspect, because it actually spiders pages rather than relies on feeds) and Yahoo getting by primarily because of the limited feeds it covers.
Russell Beattie, like Chris Pirillo, found his PubSub feeds getting washed out with spam. I thought the comments below his post were especially interesting, looking at fighting back on the Google AdSense front. It's an issue that's come up before. Not only does Google host a bunch of this junk content, but it also helps fuel it by people earning through AdSense.
Ranking By Time Magnifies Spam
Back to Mark Cuban, his post highlights one of the key issues that blog search faces. Time ranking magnifies the spam problem.
The major search engines have plenty of spam in their indexes. You simply don't see this as much because searches are sorted by relevancy. What are deemed the best pages across the entire web? Links are used to help calculate this, but textual data on the page and in the links, along with many other factors also come into play.
In contrast, blog search is largely ranked by time. Post something, send it out in your feed, and boom -- you're at the top of the list! That is, until someone else posts and pushes you back down.
How About Some Authority Mixed In?
Solution? How about ranking by time and also limiting matches to only quality blogs. Ah, but you see, that's what PubSub supposed to be able to do. When you create a feed over there, you can use the Filtering By LinkRank feature limit to the top 1 percent, 2 percent, 5 percent, 10 percent or 25 percent of blogs (or technically, feeds).
I've played with it a bit, and haven't been impressed. I got a feed for [google] and know I've limited it past the default (PubSub unfortunately doesn't show your setting after a feed is made). Nevertheless, most of the current matches right now are all coming from one site simply because the word "google" appears in the "Ads By Google" links it carries.
Still, the idea is good, so perhaps it will improve at PubSub or another service (Om Malik's saying forthcoming Sphere will do this. Cool if so, but I haven't played with it yet, and we'll all see).
News Search Is Great Because They Limit Sources
Over at Robert Scoble, his The race to time-based and blog search post last week touches on exactly the problem of mixing time and relevancy together. His view is that search engines in general suck on the time-based aspect:
Let's look at Yahoo, Google, and MSN first so you can see just how bad those three are if you want to find something that was added to the Web yesterday.
We have a great case study. Yesterday Microsoft and Real settled their anti-trust case and announced a new partnership. It was written about on hundreds of blogs and hundreds of ?pro? news sources.
We also have today?s Apple announcements. So, let?s search on both of those...
Robert goes on to be unimpressed at finding new stuff. But the reality is that search engines are great at finding new stuff. That's called news search. And news search is great because the sources are limited. Not everyone get in. It may be that for blog search to be great, you have to have that same time of limitation. More on this in my response to Robert's post, which I've reprinted below:
Let's qualify. You mean how bad they are if you only look at the web search results and ignore the onebox/shortcut displays they have.
In other words, do [video ipod] on Google or Yahoo, and at the top of the pages, they show you plenty of news results. They aren?t behind in gathering fresh data. They?re simply segregating it into the news area and giving you a heads-up that it is there.
You?re either missing it or ignoring it because those top of the page segments don?t feel ?normal? to you. All I can say is that the search engines are aware of that issue.
If you look at my Invisible Tabs article it talks about how at some point, the search engines need to automatically push the right button or tab or link for you, to give you 10 news results for queries that obviously are news related. Or you do a shopping search and you get all shopping results automatically.
FYI, Technorati's out with some handy fresh numbers, finding that two to eight person of new blogs are spam but notes this weekend's problems may have been perceived as worse simply because spam is targeting the names of people. Bloggers are big ego searchers, so if someone targets your name, blog spam can see worse.
Want to comment or discuss? Visit our Search Engine Watch Forums!
Posted by Danny Sullivan at 4:22 PM | Permalink
Dave Winer's Weblogs.com ping service has been purchased by Verisign. Dave does a rundown on the sale here. Verisign does a rundown here. Why do you care? You probably don't need to.
Blog search engines and feed reading services can and do take pings from Weblogs to understand when content is updated. But it's not the only service you can ping, if you want to send out notifications.
But why wouldn't you depend on pining Weblogs? Right now, there's no reason not to. But while Verisign pledges pinging will remain free, it also talks about adding "value added services" for a fee. Potentially, that could mean getting some pings out faster than others. Or not. We don't know, and it remains a "we'll see type of thing."
The main point is this. Ultimately, I think pinging will be heading down a path of pinging the most important places in addition to any centralized server. So if Weblogs suddenly did get all into charging for rapid response, you'd still be able to notify other places despite it.
For example, you can (and should) ping Yahoo directly as explained here. Yahoo doesn't need to wait for you ping Weblogs or even Blo.gs (the service Yahoo bought in June).
Technorati has its own ping server, and I expect that any major player may ultimately have their own, perhaps with registration barriers or other systems to cutdown on ping spam. Any smart marketer will hit these in addition to a centralized server.
The entire pinging thing got you down/head spinning? Basically, most of this is probably built into the blogging tool you use. But a good starting place for more is Ping-o-Matic, which helps you send pings manually if you need to, plus has useful background.
By the way, in addition to Verisign buying Weblogs.com, the ping system, AOL yesterday bought the entirely different Weblogs Inc. network of blog sites.
Posted by Danny Sullivan at 8:15 AM | Permalink
Chris covered the launch of Google's new blog search in today's SearchDay article, Google Launches Industrial Strength Blog Search. In this post, I want to add some of my own thoughts. I'll also be working up a rundown on reaction from others, and Gary may be adding his own thoughts as a postscript here or as a separate post. Top line thoughts? It's not spam free. I wish it were "full text" blog search to better represent the blog world. It's got a short memory, not going back past March 2005. But the backlink info looks good, certainly better than you'll get on Google itself.
Notice, a search across the ENTIRE web on Google brings back fewer backlinks than across the much more limited feed database on Google. Why? The third line shows the answer. A search on the ENTIRE web on MSN Search web search brings back more results as well, despite MSN supposedly having a slightly (very slightly) smaller database of pages based on self-reported figures. Google simply doesn't report all the backlinks it knows about for web search, something it has said time and again when pressed on the issue, a fact well know to many experienced search marketers.
Resources To Acquire Stanley Power Tool Or Draper Power Tool On The Internet Get your stanley power tool on the world wide web. The first thing I thought of is how easy it is to get stanley power tool online. Google has listings for many stanley power tool sites. There are lots of stanley power tool that will help you.
In fact, the first four results when sorted by date are all similar in terms of spammy, nonsensical copy. Doorway page spam on Google -- it is 1999!
What we need is either better spam filtering or some type of super "sort by date and relevancy" feature. PubSub's got a feature that's sort of like this, but when I last looked, I still found spam and irrelevant content getting though.
Want to discuss or comment? Visit our forum thread, Google Blog Search Launched.
Posted by Danny Sullivan at 7:19 AM | Permalink
Google (Finally) Offers Blog SearchNearly two and a half years after buying Pyra Labs (the company that developed Blogger), Google has launched its blog search service (beta, natch). Although all of the major search engines have been dabbling with blog and feed search to a degree, Google is the first out the gate with an an industrial strength blog and feed search utility. It's very Google-like, with familiar search result pages and advanced search capabilities.
For more on the new service, including tips on making sure that your own feeds are picked up by the new blog search engine, read on in today's SearchDay article, Google Launches Industrial Strength Blog Search.
Want to discuss or comment? Visit our forum thread, Google Blog Search Launched.
Posted by Chris Sherman at 12:00 AM | Permalink
Gary Price wrote earlier of Technorati's new Technorati Blog Finder, along with some issues with the new beta service. For our Search Engine Watch members, the Revisiting Technorati's Blog Finder & Listing Issues article I've now posted take a longer look at the new service, how it operates and ways blog owners can consider improving their performance within it. In summary, the article covers:
As a blog owner, should you even care about Technorati? After all, it's taken mounting criticisms over performance issues, as many are aware. Despite this, the service remains popular and something that many in the blog world care about. It's easy to improve your listings in the new service at the moment and worth a few minutes to do so.
As someone seeking blogs, the new finder service helps Technorati counter some criticism its taken over its Top 100 list. The plus to the Technorati system is that it allows, as it says, for people to make mini Top 100 lists in any particular topic.
That's what someone like Robert Scoble wants, but the tagging system it's based on leaves all types of issues. My article for members, as Gary's previous review of Blog Finder, shows the problems you get with having to think of plural and stemming terms, not to mention alternative terms (search marketing or search engine marketing -- you've gotta do both). Josh Hallett covers some further criticisms as well here, plus points to a variety of other observations worth checking out.
Still want more top blog lists? There's the Feedster Top 500 list that came up recently and there's the existing Bloglines list.
Also be sure to check out Yahoo's blog & feed finding service. I feel it's poorly known, in particular because Yahoo needs to do a much better job in making it visible.
My past Yahoo Feed Search & Web Search Feeds Update post explains the service more and the Submitting To Yahoo's Feed Search post looks at webmaster issues plus touches on how it provides the type of mini-Top 100 lists that some want.
If you're a SEW member, a longer version with many more details on the submitting side is here.
Want to discuss Blog Finder? Please visit our Search Engine Watch Forums and start a thread!
Posted by Danny Sullivan at 9:56 AM | Permalink
The transition I wrote about earlier of moving our feeds to FeedBurner is now complete. If you've received this message in your newsreader, then the feed is working fine for you. If you're reading because your feed went dark and you came to the blog to see what happened, drop me a note, and I'll see if we can figure out what happened.
All five feeds Search Engine Watch has are now being managed through FeedBurner. What are these?
You can learn about each of these feeds in more detail on our Search Engine Newsletters & Web Feeds page. It explains what each feed offers, has direct links to the feed addresses plus buttons to add feeds to many popular services.
I've also added FeedBurner "chicklets" that show the number of estimated readers for each of our feeds, except the SES feed. I haven't got tracking fully set on that yet.
Our main SEW feed is our oldest -- started back in 2003 -- and most popular with over 18,000 readers. The SEW Blog feed is almost a year old and has just over 3,000 readers.
Until now, I've had no real estimate of traffic to the feeds as well. I'll explore more about how FeedBurner does these estimates and share more stuff in the coming weeks. But in short, I'm loving it.
Aside from counts, I can also now tell how people are accessing feeds. There are interesting differences. Our main SEW feed is one of My Yahoo's top picks -- and as a result, nearly half of our SEW feed audience reads through My Yahoo.
In contrast, our SEW Blog feed is listed with My Yahoo but not on their short list of recommendations. Without that oomph, My Yahoo makes up much less of our readership for the feed, just over 15 percent. That's still a great amount, but Bloglines leads the pack, sending us 35 percent of our readers.
The SEW Forum feed flips things around again. My Yahoo is again on top sending us about 32 percent of our readership, while Bloglines follows at 23 percent.
Our Daily SearchCast feed is barely a month old, but what a difference in readership you get when doing podcasting. iTunes readers make up 30 percent, followed by Bloglines at 15 percent.
We're hoping we'll also see Odeo readers rise, as our channel became a featured channel this week. That's more than doubled our listeners at Odeo, bringing us up to 29 when I last looked.
If you haven't tried it, check it out. It's a pretty cool service. And don't forget to subscribe to us. I want to crack the top 40 and need about 140 more people to get there!
Posted by Danny Sullivan at 2:16 PM | Permalink
Later today, we're changing the URLs for our Search Engine Watch Blog, podcast and other search feeds so that we can better track who is making use of our content and how.
If all goes well, you won't notice the switch or have to change any settings. The old feed addresses will magically still operate. But I'm sending this message out now on the off chance that we might goof up. After the transition within a few hours, I'll send out another post that the new URLs are live. If you see that post, then you're fine. If the feed suddenly seems to have gone quiet -- please send me a note, because something may not be working.
Specifically, what's happening is that we're moving our feed URLs to run through FeedBurner. I've looked from afar and with envy at the type of stats and services that FeedBurner offers to those who want to manage and track usage of their feeds.
Want to have a single feed compatible with RSS 2.0 and Atom? FeedBurner does it! Want to know who is clicking on your feed links and what newsreaders or aggregators they're using? FeedReader does it! Want to easily add iTunes information for your podcast? FeedBurner does it -- and does it well even for free, if you're happy with basic stats. Want more, then you can pay a small monthly amount for that.
My hesitancy in using FeedBurner until now stemmed only from the fact that you had to use the FeedBurner domain as your feed address. In other words, a FeedBurner feed URL might look like this:
http:// feeds.feedburner.com/myfeedname?m=18282
Being the paranoid sort, I always want to have addresses using my own domain name. That's helpful if you decide to change tracking down the line, need to rename and redirect things and basically to preserve branding. In short, I like to be the master of my own domain :)
The great news is that FeedBurner recently added a way for site owners to have their own domains in their feed URLs. So I'm hesitant no longer and diving in with both feet! In my follow-up post, I'll list all of our new URLs, and you'll be able to see how our own domain name is implemented in them.
Posted by Danny Sullivan at 8:22 AM | Permalink
When will Google, Yahoo, MSN, and Ask Jeeves start indexing RSS feeds properly? from Stephan Spencer spotted via InsideGoogle is a nice look at what happens with RSS feeds when major search engines encounter them -- or more correctly, what doesn't happen with them. File formats aren't recognized, text in the feed may not be indexed and other problems exist that make web search engines a bad place to go if you want to search just against feeds.
So what? As I'll explore, indexing RSS files isn't as important as it may seem, not for the twin goals of feed discovery and blog-based news.
Feeds Have Some Info; Pages Have More
What's a feed? Essentially, a list of URLs that point at web pages. Search engines index those web pages. In fact, they index a lot of those web pages better than some "feed" search engines.
For example, if you only index what comes to you in a feed, then you're going to miss out on a lot of information. That's because plenty of feeds only carry summaries of stories, not the full text of stories.
My LinkCounts & LinkStats From PubSub's Only Rough Picture, So Far post looks at how this has an impact on counting links. The situation is even worse when it comes to understanding content. If a feed isn't full text, then you don't have a full picture of what someone was talking about.
So why do you want a "feed" search engine? In my view, these are the two main reasons:
Feed Discovery
Let's deal with the feed discovery issue first. Want feeds about cars? Here's a search at Technorati. Find the feed links, I dare you. The feed links aren't shown that I can spot. Instead, you get listings of pages that have been fed via a feed. If you're savvy, you'll visit the pages and then hunt around for the actual feed location. That's sort of feed discovery, but it's not as direct as some might like.
Here's the search at Bloglines. Apparently, the world of car-related feeds is dominated by Craiglist, because that's practically all you get on the first page. OK, let's try digital cameras.
Better. Now rather than seeing actual blog and feed posts, as with Technorati, we're getting a list of blogs and sites with feeds that generally are about about "digital cameras." IE -- you're not pointed at a particular post. You're pointed at a place that offers a feed on the topic you searched for. That's feed discovery.
How about Feedster? Again, it's a list of blog posts or pages that were in feeds. Scoot to the right of these listings, however, and you'll see an orange XML box alerting you to the idea there's a feed. So in a way, you've got feed discovery.
Yahoo has long done this. Here's a search for search engine watch. See the entry for our blog? It has this associated with it:
RSS: View as XML - Add to My Yahoo!
That's feed discovery in action at Yahoo, though it's hit and miss. Our blog feed is displayed, as is our forum feed, but the main feed for the site itself goes missing.
Certainly, such display should be consistent. But even better would be if Yahoo made it easier to find the actual feed search service it offers. Search over there for cars, and you can see the difference in getting back actual feeds that seem related to the topic. My Yahoo Feed Search & Web Search Feeds Update post looks at this in more detail.
Blog-Based News
We have excellent news search from most of the major search engines. It includes content that comes from traditional and major news players as well as small news sites from across the web. Here's a search on gm food at Google. The Food Navigator site has this article come up. It's hardly a traditional source, especially when compared to the Independent newspaper, which also has an article listed.
What the major news search engines tend not to carry are many blogs. Some are listed, but plenty aren't. Whether to integrate them as part of news search is a debate that's ongoing. Some people want the "hey, you got blogs in my news search" experience. Others want them separate.
Complicating matters more is that just putting stuff on a blog doesn't make it news, anymore than someone suddenly becomes a journalist just because they get something printed in a paper. The reality is plenty of bloggers do good journalism, plenty of journalists do bad news reporting, and the reverse and all variations you can think of!
Let's side-step that debate with the recognition that many people clearly would like to have a blog search. Blog search engines come nowhere near the popularity of major search engines, but they do generate a lot of buzz. That's no wonder. People want a sense of what's being discussed, and there's plenty of talk that goes on within blogs.
So where's the blog search with the major players? Not "where's the feed search," because that's not the same thing. There are plenty of sites with feeds that are not blogs. There are plenty of blogs that don't offer feeds. But where are the blog search services you'd have expected the major search engines to have rolled out by now?
I checked with Google on this recently, but there's nothing new I can report in terms of timing. The service has promised this would come. MSN has promised the same, but we're still waiting. More on both of those promises here: MSN's Third Portal To Gain Blogs; Where's The Blog Search?
Ask Jeeves, of course, has blog seach with Bloglines -- but it has promised better improvements to come. A9 -- not quite in the majors -- rolled out its own blog search in March that Steve Rubel found pretty killer. Actually getting to the service is pretty killer as well. Trouble finding it? The best advice is to go to A9, select the beta link, then check the Top Blogs box.
As for Yahoo, my Yahoo Feed Search & Web Search Feeds Update has them saying it's something Yahoo will consider, but better feedreading tools and management are really the priority, for now.
All The World's A Feed, And The Blogs Are Merely Players
As feeds and blogs (remember, two completely different things!) grow, search is only going to get more complex. Microsoft blogvangelist Robert Scoble has said time and again that sites without feeds are "lame," as he does today.
OK, but what happens when it's not just all the "cool kids" doing feeds but everyone doing feeds? What does feed search mean then? It means relatively nothing. It means, umm, searching the web! So banging on about search engines not indexing feeds sort of misses the point. As feeds encompass everything, the major search engines are already there.
Meanwhile, what happens when everyone is running a blog? Will blog search suddenly be so unique? Or will it be more the case that people will want "news blogs" in a news blog search, while "shopping blogs" might be in a shopping blog search and so on. Or even more likely, as search continues to go vertical, blogs of a vertical nature will be integrated within those types of results.
Posted by Danny Sullivan at 8:23 AM | Permalink
A couple of RSS-related items that crossed my desk today.
First, EditorsWeblog.com reports on recent presentations at the 58th annual World Newspaper Conference and the 12th World Editors Forum from Topix.net CEO, Rich Skrenta.
Second, the eWeek article: RSS Updates Moving Beyond Pings, takes a look at the FeedMesh weblog/RSS update service that several companies are working together to develop.
What's a FeedMesh? Called FeedMesh, the approach takes the dozens of ping services that exist today a step further by seeking cooperation among aggregators to share updates among themselves. The idea for the initiative, which is being championed by PubSub Concept Inc., was hatched last year during an informal meeting of aggregators and other leaders involved in RSS and blogging.
Posted by Gary Price at 1:26 PM | Permalink
Submitting To Yahoo's Feed SearchYahoo recently released a Publisher's Guide To RSS that formalizes some submission tips and procedures for feed owners. In Submitting RSS Feeds To Yahoo now out for Search Engine Watch members, I taking a closer look at how your RSS feed appears within Yahoo's feed search service. How are results sorted? Are you in the directory? How can you measure your feed popularity via Yahoo? Here's a summary for non-members.
The Submitting RSS Feeds To Yahoo story goes into greater depth for SEW members on all of the points above.
Posted by Danny Sullivan at 12:00 PM | Permalink
The new Yahoo Publisher's Guide to RSS will be especially useful as a one-stop shop for those just getting started with the format but can also serve as a worthwhile reference for experienced RSS types.
The site includes an intro to RSS, how to create feeds with several services (not just Yahoo 360), info about submitting and promoting your feed, a link to sign-up for alerts about new services from Yahoo (RSS advertising), and more.
Posted by Gary Price at 3:30 PM | Permalink
At a recent Search Engine Strategies conference, representatives from some of the major webfeed and news syndication services sat down talked about how to prepare, submit, and subscribe to the various webfeed search engines. In today's SearchDay article, Feeds: A New Channel for Search Marketing, guest writer Shari Thurow recounts the lively panel featuring Bloglines' Mark Fletcher, Feedster's Scott Rafer, Moreover's Jim Pitkow, Topix.net's Chris Tolles and Yahoo's Jeremy Zawodny, discussing the opportunities webfeeds provide for both searchers and marketers.
Posted by Chris Sherman at 9:33 AM | Permalink
From Robin Good's blog, news of two new feed submission tools as well as a manual list to follow in Automatic RSS Submission Tools.
The first is RSS Submit, a $25 product that sends to a variety of feed search engines. The blog post above provides a review of this.
The second is from Thomas Korte, who I've known from his days at Moreover. Thomas moved to Google years ago and also has a personal blog -- hence his interest in feed submission.
His RSS Submission Service is web-based, free, and lets you send your feed out to major indexing tools. Thomas talks about it more here, and he also provides submission tips in Website Promotion with RSS Feeds - four tips.
Back at Robin Good's site, RSSTop55 - Best Blog Directory And RSS Submission Sites gives you a rundown on major feed submission resources along with links. There's more than 55 resources on it now, and you can even take a feed to be notified of new additions and changes.
Posted by Danny Sullivan at 10:10 AM | Permalink