With the advent of the Penguin algorithm on April 24, the bill finally came due for a lot of sites involved in spammy link building methods. On top of that, Google began sending unnatural links messages to web publishers via Webmaster Tools earlier this year.
If you have been responsible for buying links, or using other bad link schemes for a website, you probably already know at least some of the things you’ve done that put your site at risk. But, if you inherited the problem, or even part of the problem, you may be fretting about the quickest way to identify the problem links and take action via some link pruning. Even if you haven’t been hit by Penguin, or received a Webmaster Tools message, you may want to do some of this analysis for yourself to see if you may have a future problem.
Step 1: Most Linked Pages
The first step is to check out some high level things to see where you stand. Using Google Webmaster Tools you can quickly find some great initial data on the most links, as this example screenshot illustrates:
For this site, the most linked page is the main page of the blog, and the second most linked page is the home page of the site. For starters, this is a good sign. The rest of the list is made up of individual articles posted in the blog. This all looks pretty clean. Let's contrast this with this site:
This is a made up scenario of a site with lots of top money pages with no value added content as being the top link recipients. This is definitely worrisome.
Any person performing all natural link building knows that toughest question to answer is always "why would they link to my page?" If it is a highly commercial page with little value added content, you know it's going to be tough. If your backlink profile looks like this, you already have a smoking gun to look into.
Step 2: Anchor Text Distribution
The second step is to look at the anchor text you have. Here's an example of a natural looking anchor text distribution:
The first result is the name of the blog, the second is the domain name, the third is the company name, the fourth is a particular article title, the fifth is the domain name again (but with a www prepended), and the sixth is a pretty odd string.
So four out of the five top spots are exactly what they should be, the URL or one of the brand names associated with the company. The fourth and the sixth ones could be of concern, but they are obviously non-commercial terms. Ultimately there’s nothing to worry about here.
Contrast that with the following:
Out of the box you can see it doesn’t look right. The site brand name is absent, as is the site URL.
All the anchor text focuses on highly commercial terms. It doesn’t make sense. No natural link profile would look like this. Even if you have awesome content, and having the best list of hotels isn’t awesome, the site URL and site brand name should be first in your anchor text list.
Even if you fix that one detail, and the top two results are the company name and the site URL, there is still cause for concern here. There is no natural variance at all. At the least, you should see some "hotels in boston" type results, or other natural variants mixed in to this profile.
These two quick checks are the first ones to perform if you're trying to do bad link archaeology, especially if you just got brought in to evaluate a site's links, and you do not know the link building history. Also, something this easy to check is easily done algorithmically, so I believe that Google is doing some checking along these lines.
Once you have done this evaluation, you aren’t done. If it doesn't show any immediate problems, you aren't necessarily out of the woods yet, though the scope of your problem may be quite a bit less than for some other publishers.