Google has always claimed to be able to find networks of bad links (link farms, etc) algorithmically, although many in the SEO community suspect that detecting bad neighborhoods is more of a manual task in all but the most blatant of cases. Either way, Google have certainly upped the ante recently with the Penguin update and a slew of emails to webmasters, warning them that they seem to have suspicious link profiles.
For the black hat link builder, the lesson to take away from this is obvious: stop doing that. But for the ethical SEO, is there anything to worry about and, if so, what can be done about it?
The only worry is really just that you don't want to get links from sites that might look suspicious (or to link out to them). Or, if you're so inclined, you might want to check out the competition's practices with an eye to reporting any bad behavior.
Is it even possible to identify bad link neighborhoods without the tools and processing power Google engineers have at their fingertips? We gave CEMPER's link research toolkit a spin to find out.
Step 1: Find Some Guilty-Looking Parties
We started out by running searchenginewatch.com through Back Link Profiler (BLP). This tool gives top level information about your site's (or a competitor's) link graph. We're especially interested in the Power*Trust profile for this process:
SEW has a pretty clean profile, with by far the most links coming from decent looking sites. In the real world, you might be working with a site which is weighted far over to the left of the graph. So, let's have a look at those low-trust links and see what we can find.
We're looking for five to 10 sites with spammy content, lots of ads, and other poor quality giveaways. This is the only part of the process that requires some human judgment; it's likely that Google's algorithms are pretty good at finding these types of sites automatically these days, even without having to resort to links.
BLP allows you to sort your incoming links by trust and other factors, but there are plenty of legitimate sites that don't look great if you just look at the numbers. There is a fair amount of trawling to do by hand.
Step 2: Find the Buyers
Once you have your motley collection of weird and wonderful links (in SEW's case, mostly hacked forums and scraper sites, although for a properly dodgy link profile you'd be looking for sites actively selling links), it's time to put them into Link Juice Thief (LJT). This tool will find which outgoing links these sites have in common, and therefore highlight potential ad buyers.
Of course, the more sites you have pulled out in Step 1, and the more dodgy they are, the better the data returned will be. On our first run with the sites we pulled out, the only common links were SEW (of course) and Twitter. Not particularly useful! Just because a site's scraping content or spamming doesn't necessarily mean that they're part of a link network.
In fact, this is where the process stumbles a little. It's a long, hard task to find suspected link selling websites, even if there are any in the first place in the link profile with which you're working.
BLP has neat features for automatically pulling out the usual suspects (porn, malware, etc), so those are obvious enough (and they might well be selling links). But for every suspect site that falls into the "zero trust" segment, there are dozens of legitimate ones (often portfolio or blog sites by people who have never even heard of SEO, let alone done any link building). These are impossible to filter out automatically, so it's going to take a while.
Nevertheless, once you've found a good list of suspected sellers, putting them through LTJ should give you a strong indication of who the buyers are. It's not enough simply to look at the outgoing links on each site: a common tactic for professional link selling networks is to mask these sold links with a plethora of innocent ones (often to trusted sites such as Wikipedia). You need a tool such as LTJ to pull out the links that are common to all of them.
Step 3: Identify the Networks
So, you've got some guilty parties. Obviously you won't want to be getting links from them (they probably wouldn't be giving them out anyway), but you want to make sure that you don't pursue links from any site involved in their network or that has sold them any links in the past. It's likely that Google have become so good at detecting bought links that any links from selling sites will be tainted (or at least worthless), even if you're not buying yourself.
Poring through the link profile of each buying site is possible, but would be fairly time consuming. Another alternative is to use Link Research Tools' Common Backlinks Tool (CBT). This does the same job as LTJ but in reverse, identifying common incoming links instead of outgoing links. This enables you to find sites that, in theory at least, most commonly sell their favors.
Whether or not you go the whole hog to identify a linking network, deep analysis of a competitor's link graph (or your own) is a worthwhile process. Many SEOs focus on trawling competitors' links for sites that could also link to you, but really getting into their profile can also tell you a lot about their overall strategy. When we went beyond just looking at SEW's profile, we discovered all kinds of weird and wonderful things going on... but that's a story for another time.
Image Credit: www.sxc.hu/profile/chidsey