Comment Spam? How About An Ignore Tag? How About An Indexing Summit!

Bloggers seem increasingly upset at the comment spam they have to deal with, something driven primarily by those who seek higher search rankings by posting links to their sites into comment areas.

To me, the solution seems simple. Why not give designers a tag telling search engines to ignore portions of a web page? Or better yet, how about a coordinated summit among search engines and webmasters to advance the state of site indexing overall?

The solution would help more than bloggers. That's good, because more than bloggers need it. The problem bloggers face has already been an issue for those who run forums, guest books or any other type of venue allowing public contributions. All are -- and have been -- targets of those who want to promote web sites.

For a non-blogger perspective at the problem, check out Mike Grehan's Google PageRank Lunacy article we ran last year in SearchDay. It discusses how guest book spam spoiled a memorial site for a good friend of his. Just like bloggers, people with guest books need help too.

I take my inspiration for an ignore tag primarily from Bruce Clay, who proposed a somewhat similar idea for <ad> tags to Google informally earlier last year. Bruce's concern was that if he or others want to purchase links, they don't want those links to harm them somehow in search engines.

Believe it or not, there are some people who buy links because of the traffic the links themselves may drive. Bruce's thought was that if publishers such as Search Engine Watch's own JupiterMedia could surround paid links they sell with an ad tag, then search engines could discount those links for ranking purposes.

Interesting idea. I also like the idea for another reason. Since we've operated our Search Engine Watch Forums, we've been liberal about allowing people to link out to resources as relevant. But this can and has been abused. Not much, fortunately, but we occasionally have to police out the irrelevant link or the link hidden in a period or comma.

One solution would be an <ignore> tag. Using this, we could surround any posted links with the tag to prevent them from being indexed. If that became commonplace on forums, it might reduce the attraction for link spam to them.

That leads to another inspiration. Six Apart/Movable Type's Brad Choate wished for some type of page-based ignore feature last July in his Restricting Google on my terms post (something he originally asked for back in Feb. 2002). His solution, which he didn't realize when doing it (check out the comments of that post) was to cloak his pages using user agent detection.

Google, of course, doesn't like cloaking. But since Brad's intent isn't too deceive Google, chances are he's not going to get busted. But even more to the point, as he says, he wouldn't have to do such a thing if Google gave him some alternative.

More broadly, lots of people beyond bloggers in lots of situations wouldn't have to do such things if search engines gave us more options. It's not a Google thing. It's not blogger thing. It's a search indexing thing.

I mentioned the ignore idea to Yahoo at our SES Chicago show and got some interest, so maybe there's hope. It poses problems, of course. An ignore tag could be abused. An ignore tag also means that some good content that's marked as "ignore" might not get indexed. But perhaps we might also have levels. How about a <content> tag authors can use to denote the key body content, a <nav> tag to highlight navigation search engines might not want to index or weight as heavily or a <public> tag to denote publicly-contributed content that might deserve less weighting?

There are lots of possibilities. What I know is that the last time the search engines came together to help provide coordinated assistance to web site owners on indexing was May 1996, when we got agreement on the meta description and meta robots tags, along with some additional talk on new support for the robots.txt convention.

Since then, we've had unilateral advances such as AltaVista (new image indexing tags), Google (robots.txt expansion, no archiving tags) or others have added but nothing coordinated to involve web site owners or the search industry as a whole. After nearly 10 years, surely the time is ripe for that type of cooperation now.

At the very least, it might help get some bloggers off Google's back who blame it for the problem. A sampling of blame and other looks at the problem and solutions:

  • Why hasn't Google stopped comment spam? which suggests a way for a link to be shown as giving or not giving credit to be passed on.
     
  • Comment Spam - Google's Role? is another call that Google should somehow do something.
     
  • Comment Spammers Have Blogs of Their Own from Yahoo's Jeremy Zawodny is another call for some smarter way of deciding what links to credit.
     
  • BTW, this is Google's problem from Dave Winer, a short note on trying to deal with referer spam, which he feels ultimately is a result of Google's reliance on links. In reality, it's a problem caused by all major search engines relying on links, but it's still a problem they can all contribute to help with.
     
  • The Solution To Blog Spamming from Nick at Threadwatch looking at various proposals, including search engine ignoring links, and finding problems with them all. He pushes primarily for better defenses that the bloggers can employ.

So what do you think? Time for an indexing summit? Are there indexing changes you'd like to see? Comments of any type? Come discuss in our forum thread: Time For An Indexing Summit?

Postscript: Support has now been officially announced for an ignore-like nofollow attribute. See the Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links post for more.