Bing’s URL Keyword Stuffing Filter Reduces Traffic to Spam Sites by 75%

Using a relatively new spam filtering mechanism that targets URL keyword stuffing (KWS), Bing says it has filtered out an average of approximately one in 10 URLs per impacted query, or about 3 percent of Bing queries overall. In addition, Bing says roughly 5 million sites with 130 million URLs have been impacted, resulting in a reduction of more than 75 percent of traffic to those sites from Bing.

To do so, Bing says it looked at a number of signals that suggest possible use of URL keyword stuffing, such as: site size, the number of hosts, and the number of words in host/domain names and path.

According to Bing, examples of spam sites impacted include www.cheapviagrausa.com, www.cheapviagrapharma.com, www.buyviagracheapviagraergr.com and www.gmailloginsigninup.com.

In a blog post, Igor Rondel, principal development manager of Bing Index Quality, explains that the goal of URL KWS is to manipulate search engines to give pages higher ranks than they deserve.

And, in so doing, the perpetrators assume keyword matching is used and matching against the URL is especially valuable, Rondel writes.

“While this is somewhat simplistic considering search engines employ thousands of signals to determine page ranking, these signals do indeed play a role (albeit significantly less than even a few years ago),” Rondel writes. “Having identified these perceived ‘vulnerabilities,’ the spammer attempts to take advantage by creating keyword-rich domains names. And since spammers’ strategy includes maximizing impressions, they tend to go after high value/frequency/monetizable keywords (e.g. viagra, loan, payday, outlet, free, etc…).”

According to Bing, it is important to address this kind of spam because it is a widely used technique and has significant SERP presence and the URLs appear to be good matches to queries, which entices users to click on them.

Bing does not disclose specific details on its detection algorithms because it says spammers are likely to use that knowledge to evolve their techniques. However, in addition to the signals listed above, Rondel says Bing looks at:

  • Host/domain/path keyword co-occurrence (inc. unigrams and bigrams)
  • Percent of the site cluster comprised of top frequency host/domain name keywords
  • Host/ domain names containing certain lexicons/pattern combinations (e.g. [“year”, “event | product name”], http://www.turbotaxonline2014.com)
  • Site/page content quality and popularity signals

And, to amplify this, Rondel says Bing tries to cluster sites by various pivots such as domain and owner and then look for patterns of the signals in the same cluster.

“This helps improve detection precision because spammers often create dozens/hundreds of similar looking sites,” he writes.

Related reading

i_fought_the_law
adblock-plus
email chart
gopro
Simple Share Buttons