SES Chicago - December 7-11, 2009

October 29, 2007

Old Time Spam Tactics Still Work (Sometimes)

I still see old fashioned tricks working from time to time. For example, doorway pages, or spammy hidden text. Granted that the sites I see this working for tend to be smaller sites that are competing for terms that are at best moderately competitive. I have seen it enough times that I finally sat down to think about why this might be.

My best guess is that it relates to the demands that it would place on the search engine infrastructure to try and detect even the obvious tricks on all web sites. This might mean that we are really looking at is a post-processing function of some sort that is run on selected sites or selected portions of the index. This function may even be largely driven by the spam reports that the search engines receive.

So whether or not a trick will get discovered becomes a hit or miss proposition. This fact leads to a continued use of these practices, even though they are clearly not for users, and in some cases reduce site usability. I was contacted just today by someone who found a site they were working with was using doorway pages (sorry, no link will be provided because I am not outing anyone), and it appears to be working for them.

Unfortunately for the site owner, this is likely not a situation that will continue forever. Someone who competes with them will report them once they discover it. At that time, they may pay a heavy price for their short term gains. It's just another example of how far the search industry has to go.

Posted by at 12:02 PM | Permalink

March 6, 2006

Hosted Doorway Pages & Paid Links Back At Stanford University

Bouncing ball time. Last April, I wrote about how the Stanford Daily newspaper was selling links for those seeking to rank better on Google, ironic given that Google was born out of Stanford University and is very anti-link selling. Then last May, the newspaper decided to abandon paid links along with doorway pages it hosted for third parties. Today, SEW Forums moderator AussieWebmaster notices that paid links and hosted web pages have come back, such as you'll see at the bottom of the paper's home page here and a hosted page here.

Nope, I don't see the use of nofollow as Google's Matt Cutts recommends, nor is the page banned by robots.txt from being indexed. Far from it, it's ranking well.

AussieWebmaster -- Frank Watson -- oversees a site for currency trading terms, which is why the Stanford-hosted page came to his attention. It currently ranks fifth out of about 40 million pages that Google has indexed for the term forex.

Well heck, at least the page carries Google's own AdSense ads on it :)

Want to comment or discuss? Visit our Search Engine Watch Forums thread, Paid Links, Hosted Doorway Pages Back At Stanford Daily.

Posted by Danny Sullivan at 2:34 PM | Permalink

February 7, 2006

More European Automaker Sites Do Doorways & Should Search Engines Be Able To Enforce Spam Rules?

Dave Naylor's been doing a tour of European automotive sites and finding others that are doing the doorway page dance that got BMW banned from Google. Meanwhile, there's some concern in the blogosphere about whether people should be worried about Google's spam rules in general. A look at both issues, below.

Dave's found this page over at Porsche Denmark that redirects to the Porsche Denmark home page. Disable JavaScript (use this handy tool for Firefox), and you can see the underlying textual content that's being cloaked.

It's hard to know what exactly is going on, as I don't read Danish. Since you can't get to this page from the Porsche Denmark home page -- and since it redirects to that home page -- it seems designed mainly to capture searchers looking for a particular topic and route them into Porsche. In other words, a classic doorway page operation.

Here's a better example. Look for klassiske porscher on Google, then you get this page, which redirects to the home page. Disable JavaScript, and the redirection stops, showing you the hidden content. A user never sees that. Porsche has no intention for them to see it. They only want Google to see it, to rank the page well and deliver them a user to a completely different page on the site.

In the comments on Dave's post, David Thulin points to this page at Chevrolet Sweden. Use that tool I mentioned above and disable styles. Now the pretty picture of a Chevy goes away, replaced by hidden text. My Swedish is as good as my Danish -- ie, I can't read this. But it doesn't seem spammy in terms of repetition. Still, scroll to the bottom, and you'll see links to additional doorway pages. Someone clearly realizes search engines don't like the graphical pages they are feeding out, so they've created a series of doorway pages. That degree of savvyness also means they should be aware that search engines generally don't like doorways.

Of course, the entire BMW situation has sparked some interesting pushback in new quarters, people who feel like Google in particular shouldn't be pushing "orthodoxy" or their own results on site designers. Google Orwellian at Publishing 2.0 is one example (I left some comments there), Death Penalty, Investigations? Sounds like the FBI... is another and Google Delists BMW-Germany at Slashdot has some similar comments. Jeremy Zawodny has some pushback of his own on the pushback over here: Google vs. BMW, a sanity check.

I think some of the outcry is mistaken. Google is simply doing what all search engines do, enforcing its own rules on what spam is. That's not anything new or Google specific. Sure, it does warrant examination. Then again, it has also been heavily debated in the past. Not everyone agrees with spam rules, but even those who don't understand that if they do something against the rules, they risk getting tossed out. But perhaps the times are a changing...

For those looking to educate themselves on spam issues, here's a reading list:

  • A Bridge Page Too Far? - From 1998, covers one of the earliest outings of a big company using doorway pages, State Farm.  
  • What Are Doorway Pages? - Originally written as a companion to the article above, I last updated this in 2001, and it's still fairly useful. It gives you an idea of how old school some of the spam tactics the automotive makers are doing.  
  • FTC Steps In To Stop Spamming - From 1999, covers how the US government stepped in to stop one of the worst cases of search spam, when content is used to mislead people (in this case, searches for things like "kids internet games" lead to porn).  
  • Pagejacking Complaint Involves High-Profile Sites - From 2000, similar to the above, covers the issue of content being stolen from a site, cloaked and used to gain rankings. It was more useful in the days before link analysis, when on-the-page factors counted for more.  
  • Ending The Debate Over Cloaking - From 2003, a very long look at what cloaking is, why not everyone agrees it is necessarily evil despite search engine rules and how the focus probably should be on the content rather than the technical delivery structure.  
  • Spam Rules Require Effective Spam Police - From 2004, revisits how search engines have various spam rules but also how they don't disclose if someone's been yanked from an index, something that would probably help site owners.  
  • The Great Doorway Debate - From 2004, a long debate in particular on whether doorway pages (like those the automakers are using) should be considered spam.  
  • Whitehat vs. Blackhat, It Is All BS - From 2004, a long debate on our Search Engine Watch Forums about what spam is, whether there are bad tactics and so on.

  • Working With Google Scholar -- And More Approved Cloaking - From 2004, covers how cloaking isn't so bad if Google decides it helps users.  
  • What, Exactly, is Search Engine Spam? - From 2005, short, to-the-point rundown on some of the things search engines frown upon.  
  • Comment Spam? How About An Ignore Tag? How About An Indexing Summit! - From 2005, covers in part how designers are questioning anew why they should worry about what search engines think.  
  • Talking About Search Engine Spam - From 2005, summarizes a discussion on "white hat versus black hat" tactics and how in my view, intent rather than actual tactics may define what's spam. The summary leads to a long review of the session for Search Engine Watch members.  
  • Google Admits To Cloaking; Bans Itself - From 2005, shows that if Google's following orthodoxy, at least it's happy to ban itself for violating that.  
  • Is Cloaking Deceptive Advertising? Not Necessarily - From 2005, looks at why cloaked content doesn't necessarily spoiled the "level" playing field some believe happens in search engines.  
  • WordPress Caught Spamming After Enlisting To Fight Spam - From 2005, looks at doorway spam that was on the WordPress site and how large, important sites caught up in spamming tend not to be penalized for very long.  
  • White Hat - Gray Hat - Black Hat - From 2005, summarizes even more articles and forum discussions on what spam is, should search engines enforce rules more strongly, is going against guidelines unethical -- you name it!  
  • Worthless Shady Criminals: A Defense Of SEO - Covers why designers would be foolish to ignore the "third browser" of search engines. You might not like the rules; you might think search engines should somehow magically understand what your all image web page is about. But you could also complain that radio needs to change because it refuses to play the pictures in your television ad. Rather than trying to work around the rules, first consider if you can build a web site that pleases human and search engines at the same time. Plenty of people do -- and often end up with more usable web sites, as a result.  
  • Google Testing Notification Of Banning To Webmasters - Covers Google experimenting with warning site owners if they are doing something against the rules.

Need yet more? The SEO: Cloaking and SEO: Spamming categories of the Search Topics area available to Search Engine Watch members takes you back for years with articles on these topics. Plus, becoming a member helps support the site and the creation of content like you're reading right now.

Want to comment or discuss? Please visit our Search Engine Watch Forums thread, Google Removes BMW Germany For Spamming.

Posted by Danny Sullivan at 9:32 AM | Permalink

April 5, 2005

Google Ranking Itself Tops For Britney Spears & The Need For Better Categorization

Spotted via InsideGoogle, Stephan Spencer argues in Does Google deserve a top 10 spot for “britney spears”? that Google is borderline spamming to come up in the top results for britney spears on its own site. I disagree. The content is relevant, nor are the changes suggested necessarily a solution to this "problem." C'mon along for a journey into the issues at hand.

One solution Spencer suggests is that Google should tweak its algorithm to favor pages with more "topical relevance" about Britney? Sounds reasonable, but in practice, not so clear cut.

What's Relevant For Britney

The fact that so many people don't know how to spell Britney's name IS relevant to the topic of her. In fact, it's long been a talking point that MSN Search used to use to highlight its relevancy.

By the same argument, the Britney Spears Guide To Semiconductor Physics should be dropped from the top results. Believe me, having watched this query over the years, that site is a long time rank holder on Google and elsewhere that has nothing to do with Britney other to use her as part of an educational parody for explaining semiconductors.

When the physics site first started ranking well, I felt the same way as Spencer. What's the deal with this non-Britney page being there? But it is sort of related, in that her fame has extended into people using her for parodies.

Similarly, I don't see that Spencer's installed any type of meta robots tag or robots.txt file to prevent his article about the Google-Britney situation from itself ranking well on Google. So when he says:

In the meantime, I think it would be in good form for Google to add a rel="nofollow" href attribute to the Britney Spears link on their Job Opportunities page and let some other, more relevant Britney fan site have that #7 slot.

Then the same should apply to his article. Shouldn't he be blocking his content from being indexed, to ensure some more relevant Britney fan site isn't bumped out if you somehow start ranking well?

How About Showing Some Topics, Not Pages

The reality is every search on any search engine will have some irrelevant results. Ideally, what you'd want for a popular and broad query on Britney is to get a better classification of types of results you can see: official sites, fan sites, sites about her film career, Britney as a part of popular society and so on. Since everything has some relevancy, such groupings help ensure you get into a particular area related to Britney that you're interested in.

For example, consider if you searched on Yahoo Directory, where you could see all directory categories like this:

  • Rock and Pop > Britney Spears
  • Rock and Pop > Anti-Britney Spears
  • Britney Spears Concert Tickets
  • Britney Spears > Lyrics
  • See how the "topical relevancy" of all things Britney is divided into four major areas? How about the 208 topics that Clusty finds, which include:

    Sadly, the demise of human-powered directories on major search engines has all but killed such categorization from really being show to searchers. But what about Ask! It clusters! It groups. Yeah, but sometimes not very well. Here's what we get for Britney:

    Sure, everything may be related to Britney in some way, but that's a far cry from actually grouping and refining topics that are specifically about her.

    Did Google Really Make This Happen

    How about Spencer's claim that "the sheer weight" of Google's own link from its job page to its page about Britney misspellings gave that page a top ranking. Hard to say.

    Google lists over 100 pages that are linking to that page, such as The Guardian mentioning the page about Britney or this site commenting on the page back in 2003. Google, of course, doesn't show all the links it knows about. So heading over to Yahoo, we see there are nearly 2,000 pages linking to that page, such as this one from Wired back in 2002. Google has certainly indexed some of those links that Yahoo has also found, even if it doesn't show them.

    I have no doubt Google's own link helped. But it also links on that same page to its Google AdWords page with the words "advertising products." But when I search for that on Google, I don't get the AdWords page. Why not? Because the sheer weight of that link on that page doesn't appear to be weighty enough.

    As for the page being a "dead-end" for users, I agree with Spencer here, in as much that given that the page is obviously getting visitors, it could be made more useful to those interested in Britney but who don't want to work at Google. And sure, maybe Google should add a nofollow link for the PR value in saying it's trying to minimize its own impact on search rankings. However, I think that's a difficult path to follow.

    Overall, I'm going to end up hoping that if a page is deemed so irrelevant by Google searchers, they'll tell Google directly via the "Dissatisfied? Help us improve" link at the bottom of every search result page.

    Disagree and perhaps think Google is indeed spamming itself? Well heck, they've banned themselves from cloaking before: Google Admits To Cloaking; Bans Itself. You can use the Report a Spam Result page at Google to report the page.

    Of course, the page might easily return at any point, if Google feels whatever was in error has been fixed. Google released WordPress's home page from its penalty after less than a day. But from what I can tell, Google's own page that was banned remains so nearly a month after it was penalized.

    Posted by Danny Sullivan at 10:21 AM | Permalink

    March 31, 2005

    WordPress Caught Spamming After Enlisting To Fight Spam

    Back in January, blogging software provider WordPress was one of several vendors that signed on to support the new nofollow attribute designed to stem blog and search spam. That's why it was so ironic when it emerged yesterday that WordPress has been spamming search engines itself.

    There's quite a debate that has since emerged over whether WordPress was really spamming and if so, should it have been deemed OK because the aim was to help support the open source blogging platform that many bloggers use.

    Was It Spam? Yes It Was!

    Let's clear up the spamming question right away. This was spam of the search variety. As I've written before, the search engines themselves are the ultimate arbiters of what's search spam. Google has declared the pages so here in comments from GoogleGuy (and yes, Google confirms to me it was the real GoogleGuy):

    There definitely appear to be hidden links on the root page of wordpress.org using CSS, e.g. "text-indent: -9000px; overflow: hidden". That's clearly against our quality guidelines at http://www.google.com/webmasters/guidelines.html#quality

    What's more, it looks like the company responsible for doing this (hotnacho.com) is also responsible for creating duplicate content in the form of posting the articles in multiple places, as you can see with this url: http://tinyurl.com/3omjj (these duplicate pages probably won't last long).

    Yahoo says the same in the Wordpress Article Spam Being Removed post from Tim Mayer, Director of Product Management for Yahoo Search:

    We are in the process of removing the WordPress article spam.

    What Did They Do?

    Wordpress Website's Search Engine Spam from Andrew Baio at Waxy.org broke the news yesterday of how he discovered nearly 200,000 pages of low quality content designed attract people from search engines and hopefully get them to click on Google AdSense ads, generating revenue for the site. A screenshot helps explain the situation more:

    This is the top of one of the pages in question. You can try to view it yourself here, but there's a good chance it will be removed shortly. Most of the other pages have been removed from the site.

    I've added all the colored boxes. The big red one at the top highlights the AdSense ad on the page. That's the goal -- get someone to come to this page from a search engine, then hope they'll click on one of those ads (or the four that were at the bottom of the page). Do that, and the site earns money.

    On first glance, the content doesn't sound bad. It does give you basic information about mesothelioma. But it's like junk food content, not really saying anything of real substance that fills you up.

    Fingerprints Of Spamming

    More important, the act of hosting all these pages shows all the fingerprints of content designed primarily to attract search engines, rather than to please humans. Note the relatively high repetition of the word "mesothelioma," a sign that the page is trying to do well for this term. Notice how the word "mesothelioma" is always a link, as I've illustrated with the blue boxes. That's an attempt to help search engines believe the pages being linked to are about that word.

    Most important is the fact that this page has no relevancy to be on this site at all. The WordPress home page gave human visitors no idea that hidden within its bowels was a resource area about mesothelioma. Instead, the site seems to be all about the WordPress software itself. This content was not being openly promoted to visitors. That's because it was instead hoped that it would be found only by search engines themselves.

    Hidden Links

    How did the search engines get to find the content? Down at the bottom of the WordPress home page were (and still are at the time of this writing) these hidden links:

    Sponsored Articles on Credit, Health, Insurance, Home Business, Home Buying and Web Hosting

    They were hidden through the use of a style attribute that kept them from being seen by anyone using a fairly modern browser. But a search engine sees things generally like old-style browsers, which means the links were visible. You can see an example of how this was so by making use of the Lynx Viewer here to imitate how a search engine crawler might have viewed the page.

    As the links weren't hidden to search engines, they found the special "articles" area of the WordPress site -- http://wordpress.org/articles/ -- and indexed the content inside there, thousands and thousands of pages.

    Targeting High Priced Terms

    If all these fingerprints weren't enough to tell you that the site was involved in trying to grab search traffic, you need only look at the topic being targeted. Advertisers regularly pay extremely high per click fees to rank well for "mesothelioma," because attorneys hope lawsuits involving this cancer will bring high settlements. The top spot for that word is currently going for $52.08 per click on Overture right now.

    Indeed, as I've written before, the high earnings that ads for that term can bring is one reason another blog site recently started up, specifically to generate content that's hoped will earn money off mesothelioma ads. The author of that site was upfront about his motivation, and the content is certainly better than the junk food search fodder hosted on WordPress. But nevertheless, as I wrote, the quest for AdSense money in that case created new content we might really not need and which possibly might push out better content from top search listings.

    Did It Work?

    So back on WordPress, the content in question was spam. We don't know actually whether it was successfully bringing in search traffic or not, much less AdSense reviews. No one I've seen has posted any top ranking examples for these pages -- and now that both Google and Yahoo have removed the pages, it's even harder to check. I did a few queries last night on things like "mesothelioma" and "coping with mesothelioma" and didn't spot them ranking well. Nevertheless, with nearly 200,000 entries in the search engine lottery, they probably pulled some traffic related to that term or for a myriad of other topics that were targeted, such as "web hosting" or "diabetes."

    The person who leads WordPress, Matthew Mullenweg, turns out to be traveling at the moment so hasn't been able to respond to the current debate. We do have his response from when questions about the content were first raised on a WordPress support forum thread back in mid-February, however:

    The content in /articles is essentially advertising by a third party that we host for a flat fee. I'm not sure if we're going to continue it much longer, but we're committed to this month at least, it was basically an experiment. However around the beginning of February donations were going down as expenses were ramping up, so it seemed like a good way to cover everything. The adsense on those pages is not ours and I have no idea what they get on it, we just get a flat fee. The money is used just like donations but more specifically it's been going to the business/trademark expenses so it's not entirely out of my pocket anymore.

    An Innocent Mistake? Hard To Believe

    Some have argued the statement above suggests Mullenweg didn't realize this content would be seen as spamming the search engines, nor apparently that hiding links would be a no-no, either. Perhaps, but you'd think he would have had some inkling they might not like this. He'd already signed on to the nofollow comment spam fighting initiative. You'd expect he'd make some connection that doing funny things with links might be seen as bad by search engines.

    In addition, last month WordPress was part of a web spam summit that was held, also described here. Since that summit covered the problem of "fake weblogs" or "spam blogs" designed to capture search engine traffic just to make money, you'd think some similarity between those and these pages would have rung a bell. True, these pages weren't blog posts. Still, they had many of the same basic goals behind having fake blogs.

    What Was The Punishment?

    However the content got there, innocent mistake or not, two major search engines have deemed the content spam and removed it from their indexes. That doesn't mean the WordPress site has gone, however. All that appears to have been specifically removed are the spam pages.

    The WordPress home page does appear to have been penalized at Google, probably as a result of the hidden links it had and still has. The home page no longer shows a score in the Google Toolbar PageRank meter, whereas yesterday it scored a 8 out of 10. That's almost certainly a penalty that's been applied. But other pages in the site still have high scores, such as the About page, so this isn't a site-wide penalty.

    Also yesterday, a search for blog software I did brought the home page up in the top 10 results on Google. Today, it's not in the top results. That's another sign that a penalty has been applied to that page. In fact, a search for WordPress itself doesn't bring up the home page on Google (it does on Yahoo still, and it was first on Google last night).

    That's something that won't last. It hurts Google's relevancy for people not to get the WordPress.org home page when they do a search for the company (WordPress.com which is now first appears to be run by someone other than WordPress). After a short period of time, WordPress's home page will undoubtedly find its ban lifted. After all, do a search for WhenU, and you get that company's home page tops in Google despite it having been banned for cloaking last year. After 42 days, it was back in.

    WordPress Users Need Not Fear

    I've seen some comments worrying that because the WordPress home page has been penalized, anyone using WordPress might be banned on Google or Yahoo. That's not a concern, I'd say. This isn't an issue as with the SearchKing case where people using WordPress might be seen as part of a network of sites to be penalized.

    On the flipside, plenty of people running WordPress now have links from their blogs to the site. Is WordPress now a "bad neighborhood," something search engines say not to link to lest you be penalized. Possibly, but I doubt it.

    If you want to be absolutely safe, then ironically make use of the nofollow attribute. It never was going to be a complete solution to comment spam, nor has it been. But as I wrote before, it is a perfect way to link to other sites without worry that you'll be penalized by doing so with search engines. More about this in my past article, More On Link Condom & Blogger Worries Over Nofollow.

    Spamming, But For All The Right Reasons?

    The links to WordPress are also fueling a debate over whether those who have done so to show their support have now been duped. I'll leave that for those in the WordPress community to argue. I've used WordPress, liked it, have recommended it before and still recommend despite what's happened. It's good software. But that doesn't entitle it to some of the excuses I've seen some make on its behalf, to justify the spamming.

    Just because WordPress is an open source project, asks for donations and needs more support doesn't entitle it for a free reign to spam search engines, "experimentally" or not. If it wants to spam, then it pays the same price anyone else pays if they want to be aggressive with search engine optimization and get caught breaking rules.

    Given this, seeing a comment like this really annoyed me:

    Hot Nacho is a company that supports open-source software, specifically WordPress. All the web geeks need to remember that there are worse companies out there than those that try to "screw with Google" for PageRank, etc. It's fun to say "spammers are scum" and I certainly don't like them, but get some perspective, there is worse evil in this world.

    All that said, I don't have a big problem with what Matt did, he said it wasn't something he wanted to do long term, but if it could help bootstrap the community it would be nice.

    Search engine guidelines against spam don't say something like, "Don't spam us, unless you're just trying to make a start and help other people, then it's OK." They don't say, "Spam us for a little bit, then you can stop when you've earned enough." They say don't spam, period. If you don't want to follow those rules, fine. That's a risk you can take, and others do as well. But don't expect to be let off for free, if you're caught.

    Here's another comment I disliked:

    I don't begrudge someone earning money from something they have put a great deal of effort and time into. Particularly when it seems to be putting back into the product and to the benefit of the community.

    Well, I do begrudge someone earning money if it's screwing up the quality of my search results. Fair to say, the searching community (anyone who searches for information) is a little broader than the WordPress community.

    Misleading Spam, An Important Tangent

    I've written about search spam many times and generally try to cover various viewpoints and illustrate how tricky defining what "spam" is can be. But as said, my view is the search engines are the ultimate arbiters of what they consider spam, for banning purposes.

    Beyond that, we individually decide what we consider spam. I come across search spam all the time -- which to me is irrelevant content that's overtly attempted to get a good ranking. I dislike it immensely when I hit this type of content, because I know exactly what the person has done to be misleading. Here's a recent example.

    I wanted the phone number of a chicken place near our home. I typed in king chicken amesbury into Google, then saw this promising "Amesbury Business Directory" page in the top listings. The page wasn't a real directory at all. It came up because it was generically designed to work for a variety of cities and topics. All these cities were named on the page:

    Amesbury Box Bradford-on-Avon Calne Chippenham Corsham Cricklade Devizes Downton Durrington Hawthorn Highworth Malmesbury Market Lavington Marlborough Melksham Mere Pewsey Ramsbury Salisbury Sherston Swindon Tisbury Trowbridge Warminster Westbury Wootton Bassett

    They worked in combination with what were called keywords related to Amesbury:

    1 litre 1 ton 100% funding 16 18 22mm 16 bit 1-6 people 16bit 1880 clu 2 litres 2 post 2 ton 2.4ghz 200 kilos 24 hour 24 hr 3 day 35mm aps 3d 3g models 4 post 5 to 1 5:1 sf 500 kilos 50s 5-1 sf 60s 68 briefs 6ixty 8ight 6mm to 25mm 7 8 9.5 mm 7 day 7 day opening 70s 76 cm 7650 games 91cm 99cm a la carte ab1 ab2 ab3 abrasive abs pp academy acce access control accessories accident accidents accommodation account accountancy accountant accountants accounting accounts accurate acerbis acrylic acrylics acryllic acton adams adapters additional address adhesives admin administration administrative adsl advance adventure adverse adverse debt advice advice advisor adviser advisers

    I'm not printing the entire list. It goes on and on, and the few lines above make the point. This wasn't a relevant page. This did nothing to satisfy my query. This was simply created in hopes of getting me to the page and clicking on some of the ads. It wasted space in Google, and it wasted my time.

    So let's bring it back to WordPress. The content it had existed solely to make money, not really to inform or help. It took away space and resources for good content I'd rather see. At least sophisticated spammers would have ensured that if they got a top ranking, I would have been delivered to something with far more useful content. That's just a prerequisite to ensure people don't end up reporting your content as spam.

    Yeah, Your Search Spam Did Contaminate!

    Another comment that caught my eye was this:

    Spamming is unsolicited. All of these posts are on a sanctioned area of WordPress and don't exist anywhere else. It'd be different if these posts were dropped into blogs and wikis all over the place but they aren't. Linking them in off-screen content is a little bit of trickery but there isn't any leeching there.

    It's similar to what Jonas Luster of WordPress argued here:

    Let?s get the first response over with - please, please, please stop calling it "Spamming". Regardless of how you stand towards the deeper issue at hand, diluting a word by mixing pretty much everything into the basket of spamming is not a good idea. Yes, the postings were made to improve the Google rank of someone else, yes there was a financial transaction involved, and yes, the postings were not topical to the wider sense of the site, but it's not spam. Spam involves other, involuntary, carriers. No comment boxes were contaminated, no mailboxes, no Usenet forums, and certainly no one spent a single byte of extra bandwidth (with the exception of the links from Wordpress.Org) on it. It's not spam.

    Honestly, statements like that are simply frightening. Spam isn't only something that happens if you drop comments or trackbacks on blogs. Neither is it some new term we've suddenly co-opted for SEO. I've personally been using it since I started writing about search engines in 1996.

    Push misleading or irrelevant content into a search engine overtly just to get traffic, and I call that spam. Break the rules a search engine sets out, and they call that spamming. The search industry has been using the terms "spam" and "spamming" for nearly a decade. Heck, even legal cases have cited spam in relation to search engines. Trying to redefine the term as it applies to search to put a better spin on the situation at WordPress isn't going to help things.

    But it didn't really hurt anyone! That's sort of the tone of this unreasonable justification:

    Matt could have put out announcements asking for donations. He could have plastered flashing advertisements all over the WordPress sites. He could have used every available opportunity to "pass the cup". Instead he chose an avenue which was out-of-sight. And instead of perceiving this as "polite", people have chosen to view it as "sneaky". "Et tu, Brutè?"

    I see. It was polite to get nearly 200,000 low content pages into the search engines, where they consumed crawler time in being found and regularly revisited, time that might have been spent on other pages. It was polite that people hitting these unsolicited pages via search engines wasted time having to go back and seek again the solid information they really wanted. Thanks for that. Next time, just put the ads up on some real content. Or yes, do tell people you need money.

    In the end, the big deal really isn't that WordPress was caught spamming. People get caught for spamming all the time. But we have never, ever had a situation I can recall where someone was caught spamming at the same time they were supposedly working with the search engines to prevent spam!

    The creation and rallying of industry support around the nofollow attribute was unprecedented. We never before had any unified effort among search engines in that way to fight spam, much less having other parties like WordPress cooperate with that. Yes, nofollow was designed to combat link spam. What WordPress did was content spam, a different tactic. But the aim of both tactics is the same -- get more traffic from search engines by trying to aggressively manipulate them. WordPress ultimately did the very thing it was supposedly fighting against. That was a very big deal indeed.

    For more on this, check out Wordpress gettin' Slammed for Spamming? at Threadwatch, for a lot of good comments especially from aggressive SEO types. Over at InsideGoogle, WordPress Caught Spamming has some nice links at the end to a collection of comments from various blogs on the situation. Want to comment or discuss what I've written or the situation in general? Then join the forum thread I've started, WordPress In Spamming Uproar.

    Postscript: As mentioned, Mullenweg is currently on vacation and generally without internet access. He's posted a brief note here on having just now seen the concerns over the spamming, and further posts will likely eventually follow on the home page of his blog.

    Postscript 2: In mid-April, I heard from Mullenweg on some questions I sent across to him. He responded that he hadn't realized what was presented to him as "advertising" was a form of "web spam," saying:

    My mindset in terms of spam is very focused on the type I deal with and fight on a daily basis, I did not think of things in terms of what search engines such as Google deal with because I've never been in that position. I'm not going to argue semantics, but that sort of artificial content, hosted or otherwise, is not something I would ever participate in again.

    Hidden links were also an issue. Mullenweg said:

    • They were wrong and shouldn't have been done.

    • He added the hidden links himself.

    I never got an answer to the final follow-up question of why the links were hidden in the first place.

    Posted by Danny Sullivan at 9:16 PM | Permalink

    February 22, 2005

    Tiscali SEO Service Accused Of Spamming

    Doorway pages are fine, says Tiscali South Africa, about a new search engine optimization service that the internet access provider and Google partner is offering to customers. Launched earlier this month, the press release about the service raised the ire of some local SEO firms with this description:

    The service includes re-writing of an organisation's home page in meta tags and hidden words. Doorway pages are created that target specific search engines to improve search engine rankings.

    One of the search engines the service submits to is Google, which gives these guidelines that goes directly against the pitch:

    • Avoid hidden text or hidden links.  
    • Avoid "doorway" pages created just for search engines, or other "cookie cutter" approaches such as affiliate programs with little or no original content.

    Tiscali defends its search engine service from ITWeb explains more on how local SEO firms are surprised and upset that Tiscali seemingly is providing a service that goes against at least one search engine's guidelines.

    The issues are tricky, of course. Doorway pages with little useful content are often seen as spam. But create a new page with some real content, and that might be acceptable. Simply calling doorway pages something else like "information pages" isn't a solution, if the pages lack useful content.

    Despite the press release, Tiscali now says that hidden words are not used. Perhaps the company meant this as a synonym for meta tags that it also named -- since meta tags are effectively hidden from users. Perhaps not. Perhaps it was confused. One thing's for certain -- the confusion doesn't inspire a lot of confidence in the service.

    Need a recent rundown on what's deemed spam. Check out our recent SearchDay article, What, Exactly, is Search Engine Spam?, for some opinions.

    Posted by Danny Sullivan at 1:12 PM | Permalink

    December 2, 2004

    Behind The Scenes Of Google's Tech

    The magic that makes Google tick from ZDNet has a look at technical details behind delivering Google searches. But, I've got a few quibbles:

    • It's 8 billion pages that Google now claims to index, of course!  
    • "You may think we don't need to know about those but that's not true," said Google engineering VP Urz Hölzle. Yes, leveraging links can help with finding those company home pages that fail to use text. But it can also lead to strange things, such as link bombs as with the miserable failure search, also.  
    • The PageRank algorithm is said to be "relatively" spam resistant. Hmm. Wander over to Google, do a search for viagra, and you'll see a site about "generic viagra online" at the bottom. Yep, it does lead you to a place that apparently sells Viagra. But this is almost certainly not the best of all the sites you could reach for this query. More important, you only reach this site after a fast redirect. Why? Because a low-quality doorway page is being used to help get a good ranking on Google. Here you go -- check out the cached copy yourself. Perhaps it's not spam, but it's hardly high-quality material. Ironically, it's probably link analysis that's also helping this page get there.

    OK, enough with the quibbles, and which in fairness I could do with Google competitors, as well. See the rest of the article for some technical details on Google data centers, the fact there's not been a complete system failure since February 2000 and more.

    Posted by Danny Sullivan at 12:56 PM | Permalink

    October 17, 2004

    Search Engine Watch Forum's 101 Threads

    Last week, one of our most energetic forum moderators Nacho Hernandez started a thread called Search Engine Marketing 101. In it, he leads off with a variety of resources useful for those getting started with search engine marketing. Comments and further contributions follow.

    Nacho also kicked off a theme. Orion, one of our newest moderators, followed up with Block Analysis 101. That looks at the concept of search engines breaking up a page into "blocks," to better understand which particular content or links within that content should be given greater or less weight.

    Member Nick W's now dived in to look at the often controversial issue of cloaking: Cloaking 101 - Questions and Answers. Some previous good threads and debate on this topic include The Great Doorway Debate, How Do I Spot Cloaked Sites?. You might also look over an article I did last year, Ending The Debate Over Cloaking.

    Returning back to Nacho, he's compiled a great list of Google Sandbox 101-style resources in Sandbox - IN or OUT? The sandbox concept relates to the idea that new pages, new links or new sites might not be allowed to do well in Google until a certain period of time has passed. The Filthy Linking Rich thread touches on this, as well.

    Posted by Danny Sullivan at 11:24 AM | Permalink | Comments (0)

    See More Posts From:

    This Week | This Month

      var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); var pageTracker = _gat._getTracker("UA-564586-7"); pageTracker._setDomainName(".searchenginewatch.com"); pageTracker._trackPageview(); window.collarity_appid = "incmedia"; //> //>

    Account Manager
    Varick Media Management New York, United States

    Reporting and Data Analyst
    Varick Media Management New York, United States

    Director of Marketing Communications
    Avery Dennison Brea, United States

    Publisher
    Confidential Leading Publisher New York, United States


    0