On a roughly monthly basis, Google updates its index of web pages. This means that pages which no longer exist may be dropped, while new pages that have been found may be added. The index update is also a time when Google may introduce new tweaks and changes to how its ranking algorithm works.
For the average searcher, this changeover period goes largely unnoticed. Google generally acts the same way it always does, despite the fact that under the hood, its old catalog of web pages is tossed out and replaced with newer information.
In contrast, some search engine optimizers watch each index refresh intimately, trying to determine if it heralds a potential rise or fall their fortunes. For them, the changeover period that's come to be known as the "Google Dance" may reveal what seems like dramatic changes in Google.
For example, someone who has a page in the top results that's accidentally dropped from the index (as can happen with all search engines) could see a significant loss of traffic until it gets restored during next month's refresh. But for a typical searcher, the loss of that particular page might not even be noticed, assuming there are 10 other good pages in the top results, for their query.
Similarly, an algorithm change that produces subtle ranking shifts across the spectrum of results may go unnoticed by searchers yet seem striking to a particular search engine optimizer, if it pushes a page from that person's web site out of first page of results.
Given the stress the Google Dance may cause some, I'd like to declare a new illness: Google Dance Syndrome. To suffer, you need to be a close watcher of the Google Dance who has been hurt by changes in the latest index.
(Fair credit notice -- after writing this story offline while traveling, I thought I'd better check to see if anyone had already suggested Google Dance Syndrome as a term already before publishing. Not exactly, in terms of rank suffering. But as a physical ailment, a similar Google Syndrome was coined by WebmasterWorld member NGene in March to describe what happens to those who worry about Google so much. And on an adult webmaster board, there was a reference to Pre-mature Google Dance Syndrome for apparently worrying about when the Google Dance was about to begin, also in March. I don't want to link directly to that adult board, but the curious can follow this Google search to reach it. ).
Watching For Signs Of GDS
It's important to understand that GDS is not communicable nor necessarily indicative of the health of Google itself. Just because an individual suffers from GDS does not mean that everyone is suffering from it. In fact, the Google Dance happens each month without many suffering GDS at all.
"Back in January, we changed some scoring stuff. It was subtle enough that most people didn't notice it at all," said Matt Cutts, a Google software engineer who deals with webmaster issues. "That's a nice thing, when you get an easy win and people don't notice yet the quality improves.
Even those who closely watch for symptoms don't often find things. Anytime the Google Dance happens, some search engine optimizers will begin posting at the popular WebmasterWorld.com forum, trying to determine if there's been an widespread outbreak of GDS. Most months, there are relatively few reported cases.
Last month was different. After the latest Google Dance in mid-May, many people seemed to be suffering GDS. Some noted that Google wasn't finding new links pointing at their web sites. Others complained that sites with spam seemed unharmed. Overall, reports grew so much that a special thread had to be started just to summarize all the issues being raised about the latest index.
Providing medical aid to these GDS sufferers was the ever diligent GoogleGuy, a Google employee who regularly responds to comments at the forum. But all were not comforted by GoogleGuy's ministration that things would improve. Indeed, his comments that better spam filtering and fresher link analysis data would be coming gave some the impression that the current Google index was not very good.
"Why is Google putting this self-described incomplete index out to the public in the first place?," one person asked in the Google Wobbles thread.
Could it be that Google had made bad tweaks in the latest index, changes that hurt not just some isolated site owners but instead webmasters and searchers across the board? If so, then the current rise in GDS could indicate a serious health problem with Google itself.
Hard To Measure Google's Health
Determining Google's health based on reports of GDS is tricky business. While more may be in the hospital than normal, some of the changes Google makes may have been designed to do exactly that. New spam filters are always being tested, as are changes to the ranking algorithm. Such alterations are designed to benefit searchers with better results, even if the side-effect is that few-some-many (depending on your point of view) webmasters feel pained.
For example, last September we had another major GDS epidemic that made news on WebmasterWorld.com, other search engine forums and even emerged into a Wired news article, after many claimed that their GDS was a sign of decreased relevancy overall on Google.
It sounded terrible, but I wasn't flooded with complaints from searchers at the time confirming that Google had gotten worse. (for more, also see the Great Google Algorithm Shift article for Search Engine Watch members). In fact, I got no complaints at all. Similarly, there's been a distinct lack of complaints this time. Moreover, in both times, I also didn't hear from more than one or two webmasters concerned about changes.
My takeaway from this is that Google is probably still working pretty well for most searchers and webmasters alike. I think anyone who has highly optimized their web site may be prone to continued bouts of GDS (and similarly hurt at other search engines, when they make changes). But those who've focused on good, solid content? They should (and do) ride through Google Dances without even noticing that it is happening.
Looking Anew At Links
As said, you need to be careful about interpreting GDS reports as being indicative of a problem with Google itself. But what about the twist after the latest dance, where GoogleGuy's own posts seem to acknowledge that the latest index was not released with the most current link analysis information or some spam filtering.
The answer, at least on the link side of things, is that Google is preparing new changes in how it leverages links as part of its overall algorithm.
"We definitely are looking at the next itineration of algorithm improvements. I think that we're in fine shape now, but I think looking toward the future that there's still are some easy wins we've identified with link analysis that we're going to go ahead and push into production," said Cutts.
As you might expect, Google's not saying what exactly it is planning in terms of more sophisticated link analysis. Speaking generally, Cutts did say that determining how likely a link is to be clicked on may be taken into account, in terms of then deciding how much importance it should transmit to other pages. There may also be a better ability to filter out repetitive links, such when a web site has standing links to certain web pages on all the pages in their site. A new link analysis system might prevent "overcrediting" for this.
The improved system of analyzing links isn't yet finished. Nevertheless, Google did need to refresh its index to weed out stale results and bring in new content. Solution? Use an existing, older "snapshot" of link analysis data for the time being, then bring in improved link analysis data later or as part of a coming index update.
Yikes! That sounds awful, sort of like saying you want to make a sandwich but need to use some bread that's a little stale. It's certainly better to have fresh data. However, it's also true as Cutts explained that lacking the very latest link data may not make much of a difference for many queries, where existing links already provide a great deal of knowledge.
"Every index has to pass a full battery of tests to say this is of sufficient quality," Cutts said, to underscore the fact that in Google's view, the current release is indeed ready for prime time.
What about the issue that Google has an index out there that's missing some spam penalties? Similar to the situation with link analysis, Google has some new spam filtering systems that it is preparing to release. So references about spam filters yet to be applied in the index relate more integrating scoring from the new systems, once ready, rather than Google simply not having any spam penalties in the index at all, Cutts said.
Cutts readily admits that it's possible to find pages in the current index that use tactics Google does not like, such as hidden text and hidden links. It's hoped that the new filters will help better eliminate this, in the future. However, Cutts added that the presence of such pages doesn't necessarily translate into bad relevancy.
"For a long time, these things have been annoying webmasters rather than users," Cutts said. "Scoring already takes care of this stuff, but we have seen posts like, 'Why isn't Google handling this'."
"We're not going to check everything in all three billion pages from the web," Cutts said. "We're going to test this on sites people have complained about."
So if you see spam, report it, and perhaps we'll see Google catch it faster than in the past. And obviously if you're knowingly spamming, stop. For the moment, you're running the risk that Google may more easily catch you, if reported. Moreover, Google does hope to expand its tool to eventually check all the pages in the index."
Also keep in mind that Google does already have some existing spam detection tools in place to automatically screen pages. Cutts is talking about filters that are supposed to be greatly enhanced over existing ones.
Any chance Google might make a public version of its spam checking tool available? It would certainly be a great help to many webmasters who worry they've spammed when they haven't. But don't expect this. As always, the concern is that a public tool would make it easy for spammers to try and outwit Google by easily testing new spam techniques.
For well over a year now, Google has operated a "fresh" crawler to revisit some web pages more frequently than its usual monthly schedule. Then in the middle of last year, Google announced that it was going to greatly increase the amount of fresh content being gathered.
Google won't say what percentage of the index is currently fresh crawled, but it does acknowledge that the amount is much more than in the past. Indeed, some webmasters have wondered if there's any difference between the fresh crawler and the regular one.
Yes, they are still different -- and the fact they are different also helps explains why occasionally a page will appear one day in the top results then be gone the next.
Each day, the fresh crawler gathers a set of pages. Some of those are "borderline" for inclusion, Cutts said. This means they may not get revisited if other pages appear that seem more deserving of a regular visit. If this happens, the page will disappear from the index.
The good news, if this happens to your pages, is that the on-off nature of inclusion should last only for a month or so. That's because what's a "borderline" page to the fresh crawler is deemed extremely important by the regular crawler. The normal crawler should almost certainly pick up any page that's ever been visited by the fresh crawler, so you'll show up in the next update and have a more permanent existence.
The off-on situation should also only happen with new pages. The fresh crawler may find them, insert them into the index, then pull them out if they have only borderline status. But once the regular crawler gets the page, it should stay seeded in the index, regardless of whether the fresh crawler decides to pay occasional visits over time.
By the way, some people believe that getting picked up by the fresh crawler means their page will rank better. Google flatly denies this, and it makes perfect sense to believe the denial. Giving a page a boost because of freshness would be a terrible ranking criteria to use. That's because webmasters would simply start changing their pages every day, simply to get a freshness boost. Instead, I suspect most people who think they've gotten a freshness boost are instead seeing a top ranking for one of two reasons.
First, they may have a new page that previously wasn't listed. The fresh crawler picks it up, inserts it into the index where it may perform well for a short period for a unique term until a new index brings in more pages that are relevant for that term. When that happens, the page "falls" in rank.
Second, and more likely, is the fact that one of the criteria the fresh crawler uses to pick up a particular page is how important it is deemed to be. If you've made improvements to your pages or gained some important links, the fresh crawler may come to visit because your page has become more important -- and that change in importance, rather than the fresh crawler visit, is helping you to rank well.
One email I did receive about the recent change, echoed by comments I've seen elsewhere, is why a search at Google might bring up different results if rerun just a few seconds later. Here, understanding the dance part of the Google Dance name may be helpful. And to understand, let's flashback first to a different search engine, the AltaVista of old.
Back in the early days of AltaVista, all the pages that the search engine had resided in one big, powerful mainframe-style computer. Eventually, one computer wasn't enough, so the index was spread across four different mainframe computers. That helped with storage but not with query load. As a result, AltaVista made a duplicate copy of its index, a "mirror" which was kept in a different physical location.
As the system got more complicated, there was a greater chance that something could go wrong. For example, if one of AltaVista's four computers went down, then essentially 1/4 of its index was unavailable to any searchers who were unknowingly directed to that mirror. If they were suddenly switched to a different mirror when trying again, they might then hit the entire index and get different results.
Now let's fast-forward to Google today. Google (like other search engines) distributes its index across hundreds of computers with a processing power similar to that used on your desktop. That solves the storage problem. But what about query load? To help, Google has multiple copies of its index in various location. When you search, you might hit a copy of the index located on the West Coast of the US, the East Coast or perhaps in Europe, to name some examples.
If the mirror you hit has a few of its computers down (which is fairly common), then some pages might not be available for searching. It's not as bad as in the old AltaVista days, because if 10 or 20 computers aren't working, that's a tiny amount compared to the hundreds that still are. Nevertheless, having some computers down at one mirror could cause the results to be slightly different if you get directed to a different mirror on your next search.
And now to the "dance" part. When Google updates its index, it has to spread the new information across these hundreds of computers in various locations. It generally takes a day or two until the new information is seeded and stable. As a results, some of the results may seem to "dance" around with slight changes, especially to webmasters who monitor positions like hawks.
So if you've done a search, then repeated it and gotten different results, two things are likely. First, you may have hit a different mirror of the index on your repeat search where the copy isn't perfectly in line with the first index. Second, and more likely, you've done a search and seen the Google Dance in action. And to confirm if it's happening, consider visiting the Google Dance Tool.
Another change to be aware of comes from back in March, when Google began no longer crediting web sites with links from "expired domains." What's that mean? Let's say a web site closes shop. Someone else buys the domain and "inherits" all the links that are still pointing at it. In the past, Google might have kept on crediting the site with these links. Now, that no longer happens.
For example, say someone had registered dingofood.com, then lets it expire. Now someone else registers the domain for use in a new web site. Google will give the "new" dingofood.com credit for any new links pointing at it since it was registered. But any "old" links on the web still pointing at the site? None of those will count toward helping the site with link analysis or PageRank score.
How does Google know what's a new versus old link? The search engine stores a date along with each link it detects. In this way, it can tell when a particular link was created. Then it's a matter of comparing the link's "birthdate" to a domain's "birthdate" in whois records.
But what about the case of an expired domain ranking tops for the word discount due to the many links pointing at it, as was raised at WebmasterWorld.com recently? That domain never actually expired, Cutts says. Ownership may have changed, but the birthday of the domain is years old.
OK, but should this "under construction" domain should be doing so well for for the term "discount," regardless of how it got there?
"It's fair to leave it up. Discount is not a common search, and I don't know what's would be considered the best for that," Cutts said.
Take away from this? If you find a domain that has no owner but lots of links pointing at it, don't expect to get credit for those links with Google. You'll have to build an entirely new reputation for your site.
In contrast, for the time being, you would be able to "inherit" the links if you obtain a domain from someone else through a transfer of ownership, rather than letting the domain expire and be registered anew. Of course, it's likely that Google will eventually come up with a mechanism to thwart this, as well.
Blogs To Stay
By the way, one thing NOT in the cards for future index changes are any plans to pull blog content out of Google's regular search results. Google made a special point of stressing that blogs are staying, during my interview with them last week.
The idea that blogs were to go came out of a Register article last month. The piece suggested that if a "blog" tab was eventually added to Google, blogs themselves would be removed from the main web page index to increase relevancy. As proof of this, the Register said this is what happened to Usenet posts after Google "acquired Usenet groups" from Deja."
First of all, Google didn't acquire Usenet groups -- no one owns Usenet groups, any more than anyone owns the web. Instead, Deja had archives of posts made in those groups. Google acquired those and then began crawling Usenet to add to the archives. As Usenet information had never been part of the web index, there was nothing to "pull."
So if a blog index is created, it's not a given that blog content would be pulled. Indeed, Google has not pulled directory or news listings from the web index even though both types of content can be found via their own tabs.
And will a blog tab really be coming? Eventually, sure. But it's not something in any immediate plans, Google says.
GDS: Canaries Or Chicken Little?
I've written and spoken before that in the search engine coal mine, there are two types of canaries that spot danger: research professionals and webmasters/webmarketers. Both groups intimately study search engine results and notice changes before ordinary data miners -- the average web searcher.
So when webmasters start reporting GDS, their concerns have to be seriously considered. Some of those suffering from GDS do indeed represent changes at Google that may be bad or imperfect for searchers. Even Google knows this. "Is Google perfect? Of course not," wrote GoogleGuy, in one of the many recent threads emerging out of the latest Google Dance.
But despite Google's imperfections, GDS reports do not necessarily mean the sky is falling for Google itself. How to know for certain if it is? Abandonment by searchers is also a sure sign, though that's a long-term trend.
In the short term, an outcry by both researchers and SEOs lends more weight that something is wrong. Otherwise, I'd watch for major and growing GDS outbreaks among SEOs happening for several months in a row before seriously thinking that Google itself has made some type of terrible mistake.
Introducing SES Online
Want to view one of the sessions you missed or listen to an especially informative presenter a second time? SES New York sessions are available for purchase on ClickZ Academy's new e-Learning site. SES is now Online!