SES Chicago - December 7-11, 2009

September 25, 2006

Google Leads In Dead & Old Pages

In Google Has the Largest Number of Dead and Old Pages, the Google Operating System Blog points to a video and some research from Google's Ziv Bar-Yossef that discusses how to grab a random sample of pages from major search engines and extrapolate from those pages information about the search engines. This can be used in a number of ways.

One interesting piece of information that you can determine from the method he discusses is the percentage of dead and of old pages that a search engine may contain. In comparing Google, MSN, and Yahoo! following these methods, Google appears to contain the largest number of dead pages. The Video is from an August 17th Techtalk from Google covering this.

In addition to the information that it provides about search engines, and this method of sampling them, the video also discloses that Ziv Bar-Yossef joined Google a couple of weeks before the video footage was shot. Ziv Bar-Yossef previously worked at the IBM Almaden Research Center, and was most recently at Technion - Israel Institute of Technology, Israel.

I also wrote a little about this reseach at the SEO by the Sea Blog in How Do You Estimate the Size of A Search Engine?, and include with that post a listing of some of the patents that he was involved in developing while at IBM. One of the more interesting was one on Methods and apparatus for assessing web page decay.

Ziv Bar-Yossef brings a wealth of knowledge to Google. Another interesting recent paper he was involved with, while at Technion, looks carefully at Different URLs with Similar Text, and ways that search engines could identify those more easily.

Posted by Bill Slawski at 1:42 PM | Permalink

September 5, 2006

Google Updates Terminology Of Last Visit Date In Cache Results

Vanessa Fox posted an update at the Google Webmaster Central Blog on what the date and time displayed on the Google Cache page really means. The date displayed technically shows the last time Google "retrieved" data off the page, meaning if you have a page that hasn't been updated, and Google visits the page and sees that it was not updated, then Google will not retrieve any new information from that page and it won't update the date displayed on the cache page. Here is an example of the cache page of Search Engine Watch, carefully look at the date in the Google cache, right now it reads, " is G o o g l e's cache of http://searchenginewatch.com/ as retrieved on Sep 1, 2006 08:07:09 GMT." And then compare it to the last article that was posted, they should be within a few days of each other - since this site is crawled frequently by Google.

Posted by Barry Schwartz at 11:58 AM | Permalink

May 2, 2006

MSN Bests Google & Yahoo in Search Shootout

Interlink, a Cincinnati web marketing research, search engine optimization and consulting firm, recently ran some tests to assess performance of major (and minor) search engines. Surprisingly, MSN took top honors in relevancy.

Like the search engine relevance tests we've done from time to time here at SEW, this isn't a controlled experiment, but rather a test of how well search engines return "reasonable" results. As such, this study should be regarded as interesting but not definitive.

Over at the 360, Susan Kuchinskas reports:

MSN bested the top two search providers, along with Ask, AOL, Gigablast and Wisenut, when all five factors were considered: relevancy, freshness of content, failure rate, difficult search results, and non-organic or extra features. Difficult searches were queries such as ?car dealer fargo north dakota? and ?appliance repair des moines.?

More information and full results of the study are available here.

Posted by Chris Sherman at 4:43 PM | Permalink

April 17, 2006

Google's Cache Being Helped By The AdSense Mediapartners Bot

Publishers running AdSense on their pages may find that the Mediapartners-Google bot - the special Google bot used by AdSense to determine ad targeting on a publisher page - is actually sharing the results of those crawls with the main Google search database.

Greg Boser spotted it when pages being served strictly to AdSense began showing up in the main search database. And cache dates and times are matching exactly with when the Mediapartners-Google bot visited the page for ad targeting purposes.

How significant is this? At this point, it is uncertain, although Google clearly states that being an AdSense publisher does not help with search engine rankings. And it seems that the Mediapartners-Google bot is not adding new pages to the search index, but rather updating pages currently in the index.

For a more detailed look, visit both WebGuerrilla and JenSense.

Postscript: Matt Cutts has confirmed that the AdSense mediapartners bot is indexing for the main search index.

Posted by Jennifer Slegg at 2:06 AM | Permalink

September 8, 2005

Squeezing The Search Engine Loaf For Freshness

Phil Bradley points to this research paper, The Freshness Of Web Search Engines' Databases (PDF), out of Heinrich-Heine-University in Dusseldorf that analyzed the freshness of Google, Yahoo and MSN over six weeks in February and March 2005. Google came out best with the most pages updated almost daily, but MSN had the best "worst case" scenario with no page more than 20 days old. Yahoo was said to be "chaotic." There's much much more in the paper which, sadly and ironically, is already out of date in terms of knowing what's happening right now. But having benchmark for various points in time is great.

Posted by Danny Sullivan at 2:06 PM | Permalink

See More Posts From:

This Week | This Month

  var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

Account Manager
Varick Media Management New York, United States

Reporting and Data Analyst
Varick Media Management New York, United States

Director of Marketing Communications
Avery Dennison Brea, United States

Publisher
Confidential Leading Publisher New York, United States


0