Here’s a new 21 page (pdf) technical research paper from the Stanford InfoLab that takes a look at link spam. It might be of interest to some of you.

Title: Link Spam Detection Based on Mass Estimation
Authors: Zoltan Gyongyi (Stanford), Pavel Berkhin (Yahoo), Hector Garcia-Molina (Stanford), Jan Pedersen (Yahoo)

Abstract: Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page’s ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavy-weight link spamming.

