Thoughts On & Poking At Google Blog Search

Chris covered the launch of Google's new blog search in today's SearchDay article, Google Launches Industrial Strength Blog Search. In this post, I want to add some of my own thoughts. I'll also be working up a rundown on reaction from others, and Gary may be adding his own thoughts as a postscript here or as a separate post. Top line thoughts? It's not spam free. I wish it were "full text" blog search to better represent the blog world. It's got a short memory, not going back past March 2005. But the backlink info looks good, certainly better than you'll get on Google itself.

  • Chris mentioned this in his article, but I think it's worth stressing, technically, this is FEED SEARCH. You are only searching through any feed that Google has found. Some blogs don't have feeds. Some feeds don't come from blogs. Google understands these issues and figures down the line, it may have to revisit changes to make it truly a blog search, if that's what's intended.
  • By default, sorting is by RELEVANCE, not DATE. If you are looking for the latest posts on a particular topic, use the "Sort by date" link in the upper right-hand corner. Unfortunately, you can't save this as a preference. However...
  • As Chris noted, you can have results constantly sent to you via a feed alert. The feed links are at the bottom of each page. So if you wanted to know the latest blogs mentioning Google, you'd search for that word, sort by date, then subscribe.
  • Want to know the latest backlinks to your blog? Use the link: command, such as, sort by date, then subscribe to a feed of that search. That shows all links to your domain, to any page anywhere on your blog and will send you the newest ones.
  • Want to know the latest backlinks to a particular post? Use the full page address, such as That brings back matches linking just to that page.
  • Don't want to learn these commands? Just type in a full URL, with or without the http:// prefix into the Blogger version of Google Blog Search. It will automatically do the right thing there and show backlinks.
  • As Chris notes, Google says that for blog search backlinks, it's not suppressing any of the links it knows about. To spell that out, here are some figures to contemplate:

    Notice, a search across the ENTIRE web on Google brings back fewer backlinks than across the much more limited feed database on Google. Why? The third line shows the answer. A search on the ENTIRE web on MSN Search web search brings back more results as well, despite MSN supposedly having a slightly (very slightly) smaller database of pages based on self-reported figures. Google simply doesn't report all the backlinks it knows about for web search, something it has said time and again when pressed on the issue, a fact well know to many experienced search marketers.

  • It's not FULL TEXT blog search. Huh? If you post to a blog, you might not send out the entire text of your post in a feed. We don't, for instance. Our reason is that we don't want everyone assuming they can reprint our material. Jason Calacanis of Weblogs has written of similar issues despite copyright warnings in his full-text feed. But Google's only currently searching what's in the feed, meaning that it actually may be ignorant of a huge amount of blog content that's not pushed in a feed. That produces some skewing, as I found with PubSub back in June.

    Ideally, I'd like to see Google do what Technorati does and grab the actual full-text of the post, rather than depend just on the feed. For its part, Google says this is something it's pondering.
  • The site: command is said to work, but I didn't find that the case. came back with no matches, for example. But the new seems to do the trick. However, compare that to on Google web search. Blog search gets about 414 matches, while web search of that blog brings back 344,000 matches. It's a huge difference and show the greater blog coverage Google web search actually gives.

    The advanced search page highlights the issue. You'll see that the earliest date you can search back to is March 1, 2005. In other words, the feed database has a much shorter history range than the web database, something that full text indexing would solve -- though you'd lose the ability to more accurate do things like author and date range searching if you're taking scraped data, rather than delimited data in a feed.
  • Spam clearly hasn't been eliminated. A search for google blog search brings up a series of "Related Blogs" that are all spammy in nature to me. However, the main results below look fairly clean. But for a query on google, spam is back with a vengeance. The first result (on Google's Blogger service) tells me:

Resources To Acquire Stanley Power Tool Or Draper Power Tool On The Internet Get your stanley power tool on the world wide web. The first thing I thought of is how easy it is to get stanley power tool online. Google has listings for many stanley power tool sites. There are lots of stanley power tool that will help you.

In fact, the first four results when sorted by date are all similar in terms of spammy, nonsensical copy. Doorway page spam on Google -- it is 1999!

What we need is either better spam filtering or some type of super "sort by date and relevancy" feature. PubSub's got a feature that's sort of like this, but when I last looked, I still found spam and irrelevant content getting though.

  • Freshness or comprehensiveness seems an issue. For that query on google, I get the latest post as being 40 minutes ago, with the one after that an hour ago, then the next one two hours ago. That's it? Over the past two hours, there's only been three blog posts about Google?

    While I don't want all those poor selections where just anything mentioning Google may come up, I also want to see the latest. What we need is either better spam filtering or some type of super "sort by date and relevancy" feature. PubSub's got a feature that's sort of like this, but when I last looked, I still found spam and irrelevant content getting though.

Want to discuss or comment? Visit our forum thread, Google Blog Search Launched.