Human Input and Algorithmic Search

The subject of human edited search is hot again. Many factors are driving this, but I believe that the basic reason for this resurgence is that we are all getting a bit more sophisticated about the challenges faced by search. Some of the recent articles about this include a New York Times article by Randall Stross called The Human Touch That May Loosen Google’s Grip, The role of humans in Google search by Matt Cutts, and a series of posts by Michael Gray about Jason Calcanis’ Mahalo, with the most recent of them being this one.

It’s a really important topic, because all of the existing search engines remain susceptible to spam sites. Many people believe that human input can play a big role in cleaning up the remaining spam problems the search engines have. Certainly, you can imagine ways in which this would be quite effective. But the most scalable technique is to implement social search algorithms, where search engine users vote on content quality, and this then affects site rankings.

But this technique is also subject to wild inaccuracies. You can’t vote on the truth. As an example of this, Wikipedia may be a great resource for researching certain types of information, but don’t rely on it for making major life decisions regarding your health and finances. You have no idea of the knowledge and background of the people who created the articles you are reading there. Or look at the current Mahalo output for a domains by proxy search which reads like a full page add for GoDaddy (read the Michael Gray post above for more discussion about that).

So it will go with social search. You just don’t know enough about the authoritativeness or motives of a person’s vote or input when you receive it. You are also subject to mob mentality and unpredictability. My point is that social search can be used in some interesting ways, but it is not an instant solution for improving search results.

I do believe that social search can play a significant role in the search engines of the future, but, I also believe that the search engines would be wise to build up an internal staff of human reviewers. These human reviewers would have categories where they would be responsible for reviewing the quality of the results, and detecting patterns of spam. Where necessary, they could make manual corrections, perhaps as a weighting factor into an algorithm.

They only need to focus on the major categories, because these are the categories where the spammers stand to make the most money if they succeed. Drive them out of these categories, and you will have a huge impact on the spam problem. After all if the total potential return is driven way down, then the spammers will lost interest. This is, in fact, the big lesson that I think that the search engines should learn from the social media sites.

Spam is much less of a problem on social media sites because the potential return from a successful campaign to rank for something is much less. Whereas success in a search engine can be worth millions of dollars. The way to drive the spam from a search engine is to eliminate the big potential returns. Given that you only need to focus on the high dollar categories, I might even suggest that this approach was “scalable”.

Related reading

The word PREPARED is written on a blackboard with the UN crossed out. A hand is underlining it.
A hand holding a transparent piece of plastic or glass, with the Google logo superimposed onto it.
Simple Share Buttons