Back in March, I published a case study on the SEW blog titled How Long Does it Take a New Site to Rank in the Search Engines? The study showed the pace at which a new site about a popular American folk song, Follow the Drinking Gourd, climbed the rankings in Google, Yahoo, and MSN.
Today, I am going to update the results since the initial post, and also make some observations about how each search engine responded to removed pages.
What made the study interesting is that the site quickly established itself as an authority with regard to its subject matter, an enigmatic American folk song first published in 1928, but with origins rumored to go back to the Underground Railroad during the American Civil War. Other sites on the same subject rapidly switched over to linking to the site, and in fact some of the "competing sites" removed their content and then linked to the Follow the Drinking Gourd site.
Because of the way this unfolded, you can get a pretty direct view into the latencies of each search engine in ranking such a site. Of course, in more competitive market spaces, there are many other challenges to getting ranked, but knowing the latency involved is a good thing.
The data showed Yahoo and Live Search moving the new site into top 10 positions after only 19 days. In the original write-up, we saw that after 47 days Google had not yet done so. Now we have some additional data about the progress of the search engines involved.
Case Study Update
On April 21, 94 days after the launch of the Follow the Drinking Gourd site, Google suddenly moved the site into the fourth position for the search term "Follow the Drinking Gourd". Here are the updated results:
|Date||Google Search||Yahoo Search||MSN Search|
|Week of 1/29||>1,000||>1,000|
It's really quite interesting to see that the latency with Google was far longer in the case of this site.
Caching in Google
Joel Bresler, the author and owner of the Follow the Drinking Gourd site, has meticulously recorded some notes on the caching and indexing of pages on his site by Google. In particular, Joel conducted some tests where he would cut a long phrase from a page in the Google cache, and search on that phrase in double quotes.
For one example, the following page contains the phrase "formed the narrative core of a planetarium show." On January 31, this page was cached by Google for the first time. Joel searched over time for that test phrase at Google, and recorded the following: no results were returned in a March 16 search; the page was found in searches on March 18 and March 31; no results were returned on April 24; and the page was found again on May 13.
This indicated that being in the cache does not mean that you are completely and truly indexed. Note that as of today, the search does in fact return the target page. It should also be noted that the page in question is in the supplemental index, so perhaps there is a connection there.
Removing Pages from the Index
At the beginning of February, the top three Google results for the search query "Follow the Drinking Gourd" were:
- Pocantico Hills Elementary School
- Madison Wisconsin Metropolitan School District
- NSA (National Security Agency)
Toward the end of February 2007, all three of these sites removed their pages from the web. In none of these cases did the site owner use URL removal tools to manually pull the pages from the index. They simply were relying on the search engine to discover that the page was gone, and drop them from the index on their own.
Note that you don't want a search engine to instantly drop your page when it's not found, because server errors happen all the time during crawls, and you wouldn't want one fetch failure to cause your page to disappear from the index. The following table shows how the various search engines performed in detecting the missing pages and removing them from the index:
|Average Days to Remove from Index||15||8||None removed to date|