A new Yahoo patent application published today builds upon a method for finding reputable pages on the web to reduce web spam when ranking web pages to present as search results.
When Combating Web Spam with TrustRank was published back in August of 2004, it caused somewhat of a stir, coming up with a way to find reputable web pages based upon a couple of simple concepts:
- "Good sites seldom point to bad ones."
- "The care with which people add links to their pages is often inversely proportional to the number of links on the page."
While the trustrank paper describes a process for finding good pages, it doesn't take the next step, and explain how it could be used to help rank search results. In the conclusion to the paper we are told that:
In a search engine, TrustRank can be used either separately to filter the index, or in combination with PageRank and other metrics to rank search results
The details of how that would happen weren't included. Today, we're given a glimpse at one possible approach.
Yahoo's patent application, Link-based spam detection, describes a way of sorting spam pages out of search results, in combination with pagerank. It presents a largely automated method for separating reputable pages from spam pages, with a little help from people manually identifying reputable seed pages.
Want to comment or discuss? Visit our Yahoo Web Search area of the Search Engine Watch Forums.