Last Wednesday I had a chance to speak with Adam Lasnik about paid links, duplicate content, and a few other issues. Adam provided some great insight into Google's thinking on paid links, what their objectives are, and how they view Nofollow.
We also talked about duplicate content problems. One interesting thing is that Adam confirmed my long held notion that one of the big problems with duplicate content pages is that it wastes some of your "crawl budget". The Googlebot comes to your site with an indea about how many pages it is going to crawl.
If it spends some of the time crawling pages that are duplicate, and therefore will be filtered out, you are wasting a portion of your crawl budget. For sites that are not fully indexed, this is really unfortunate, because these duplicated pages got crawled instead of having the bot go deeper into the site and get new pages, and resulting in those pages getting into the index.
We also talked about what is going on when sites are popping in an out of the index. It seems that the algorithm tweaking is a constant process at Google. Sites that pop in and out of the index are simply those that are "on the edge" of some Google criteria. When they tune the algorithm one way, you're in, and then they tune it another way, you're back out.
If this is happening to you, this is a clear sign that the Googlebot has detected something that is a signal of poor quality in your site. Unfortunately, this can be any one of a number of factors, so it leaves you in the position of having to figure that out.