TechCrunch recently had a post lamenting the fact the Barnes and Noble's new How-To site, Quamut, is being spammed by SEO guys looking for some free link juice. The B&N site wasn't adding nofollow to their external links, so it's been open-season for SEOs. (Before you get all excited, they've now changed the links to nofollow.)
To many people, that SEO spamming may look like a bad thing. I think it's the best thing that could ever happen to Quamut.
Unlike other types of spam, good link spam carries with it a wealth of benefits for the site being spammed: 1. It brings users. When a new social site debuts, especially when it is a "me-too" site like Quamut, getting users is tough. Unless you offer some special incentive, or your site provides something necessary that other sites don't, you have to fight a tough battle for users. If someone wants to add link spam to your site, they need to sign up. The thousands of SEO Spammers out there can quickly become thousands of new members of your site. And when the spammers sign up under multiple accounts, they can quickly become tens of thousands of new members. 2. It adds content. It might not be the best content ever written, but SEO spammers do know how to write content that, at the very least, is unique, keyword-rich and geared to any user that might stumble upon it. Contrary to popular belief, SEO spammers are not interested just in backlinks, but also in filling up the SERPs. If they can get a page on your site to rank by combining their content with strength of your site, and then convincing the user to shift to their site, the bottom line stays the same. 3. It raises stature. When your brand spanking new social network has 10,000 members and 50,000 UGC articles after only one month, your site starts to get noticed--even if most of those members are spammers and that content is primarily spam. There's a reason companies like MySpace and YouTube didn't crack down on spammers--and even explicitly allowed spam in their original Terms of Service. If you want to grow--and grow fast--no one will help as much as spammers.
SEO Spammers contributed to padding out Wikipedia; for every great article that was written to insert a spammy link, Wikipedia got a great article. They helped get YouTube to critical mass; for every YouTube embed done to get a YouTube backlink, YouTube got more video views. SEO Spammers keep MySpace growing. Do you still know anyone with a MySpace account? Can you tell me how their growth keeps skyrocketing? Check the inbox of your old MySpace account and you'll see how.
In short, SEO Spammers are helping the internet continue to grow. As each once-spammed site gets big off of the shoulders of spammers, they introduce methods to lock the spammers out, and the spurned SEOs move on to new sites. The cycle continues--and, with it, innovation in the social and user-generated content fields.
If popular sites are suffering under a flood of spam, I sympathize with their decision to add nofollow their links and put barriers to stop spammers--as long as they don't forget who made them popular to begin with.
Posted by at 10:22 AM | Permalink | Comments (4)
Microsoft to Fight Search Spam by Analyzing EmailHere's a story I missed when it broke. On March 25, Microsoft was awarded a patent it applied for nearly 4 years ago, to fight search spam based on external elements, like "electronic documents," or email. The prevailing theory is that similar indicators will show up in spammy emails and spammy blog comments and other SEO spam.
Given the resurgence of spam from SEO companies, Microsoft may also want to use the spam filters built into Outlook to highlight potential SEO spammers, working on the theory that spammers are spammers, in any and all fields. No question that this approach may be susceptible to some level of abuse, but given the amount of people using Office, it's unlikely that subscribing your competitor's newsletter and then tagging it as spam will really affect them. SeoByTheSea wants to take it even further, and suggests that the URLs in spam emails get tagged as SEO spam as well.
But before we get all that excited about what direction Microsoft (or MicroHoo) can take this innovation, we need to remember how poorly Microsoft has used this to deal with SEO spam in the past 4 years. A Google or MSN group with any keyword in the title will still rise, almost automatically, to the top of Live.com SERPs, regardless of its relevance. Let Microsoft fix that loophole first, and then go after email/SEO spam convergence.
Posted by at 7:59 AM | Permalink
In a paper entitled Spam-Double Funnel: Connecting Web Spammers with Advertisers , researchers at Microsoft and the University of California Davis show the path whereby the ads of legitimate web site owners come be shown on spam pages. The paper reported on Monday in a New York Times story Researchers Track Down a Plague of Fake Web Pages is to be delivered in May at the 16th International World Wide Web Conference in Banff, Alberta, Canada. The paper's methodology, finding and conclusions are of interest to search marketers.
For this paper the researchers focused on redirection spam (for examples of redirection spam) where Web pages redirect browsers to visit known spam controlled domains. Many of these redirection spam pages use pay-per-click advertising and frequently display ads from reputable advertisers. Many research papers on search spam are essentially descriptive seeking to categorize the various forms of search spam. This paper provides means for identifying not just how these redirection schemes work but points to who is involved in the schemes.
To unravel these redirection schemes and identify the sources, the researchers simply “followed the money” analyzing the end-to-end redirection paths (for more on the methodology and how you can use similar tactics, see Strider Search Ranger). In the paper they outline the methodology they used to analyze tens of thousands of spam links found for this piece of research. To describe their findings they created a five-layer double funnel model that includes:
- Doorway pages - Redirection domains - Aggregators - Syndicators - Advertisers.
Spammers control the doorway pages and redirection domains, aggregators buy traffic from the spammers and sell traffic to the syndicators who in turn are paid by the advertisers for to display their ads. The system works both two ways.
For their study, the researchers used 1,000 keywords spread across ten spammer targeted categories – spammer targeted keywords in one set and most bid advertiser keywords were targeted in a second. Predictably, the categories included:
- Drugs - Adult - Gambling - Ringtones - Money - Accessories - Travel - Cars - Music - Furniture
The results of the analysis and the conclusions are of particular interest:
For Layer #1 – the doorway domains. The free blog hosting site blogspot.com was responsible for one in every four spam appearances in the top search results. At least three in every four unique blogspot URLs that appear in the the top 50 results were spam. (Aside – this is not new news to most search marketers, but it is nice to see real hard data on this.)
For Layer #2 – the redirection domains The spammer domain topsearch10.com figured prominently and 209.8.25.150~209.8.25.159 IP block where it resided hosted multiple domains responsible for 22-25% of all spam appearances.
Layer #3 – the aggregators which the authors believe present the best target for attacking search spam and are a bottleneck. Two IP blocks 66.230.128.0~66.230.191.255 are responsible for the 100,000 spam ads in the sample (Aside -- Talk about a bad neighborhood).
Layer #4 – the syndicators includes just a handful of ad syndicators who serve as middlemen for the majority of the spammers.
Layer #5 – the advertisers includes many well known reputable advertisers whose ads garner traffic funneled through the system. It is advertiser money that fuels the entire system.
The authors hope that their paper will help search engines strengthen their ranking algorithms and will provide impetus for advertisers to carefully scrutinize their involvement with syndicators and traffic affiliates.
Posted by Amanda Watlington at 1:06 AM | Permalink
Yesterday, on Thanksgiving, The Register reported that a search at Yahoo Images for franchise returned very offensive and disturbing images. I will not describe the images, but I saw them myself and as soon as I saw it, I emailed my contacts at Yahoo. Soon after the images were pulled from the search results. It seems to me that someone figured out a way to easily insert pornographic images into Yahoo images for a search term even with safe search on. The Register has blurred and censored screen captures of the first line of results.
Posted by Barry Schwartz at 8:47 AM | Permalink
The other day I reported that Microsoft Banning Sites from Live.com For Link Exchanges, where I uncovered an email sent to a Webmaster. The email stated that a particular site was removed from the Live.com Search index because the site was "acquiring links through posting to or exchanging links with sites unrelated to your site content." The email also added that these types of links are "spam links," and is the reason the site was delisted from the index.
It struck me that this is why Google and Yahoo remain very vague when telling Webmasters why their sites are deindexed or penalized. Simply, people may look at this email and figure that exchanges links with your friends is a bad thing. If you have a personal blog about your life and you wanted to link to your dad's dental practice web site, there is nothing wrong with that. But if you do run huge link exchanges, then you need to be worried. The email sent to this Webmaster might not be clear enough to explain the difference, and get other Webmasters worried.
Posted by Barry Schwartz at 9:28 AM | Permalink
Social Search Manipulation: Case StudyNiall Kennedy has one of the most thorough write-ups on why search spam exists with his article "The Spam Farms of the Social Web." The article explains how he stumbled upon a spam site, researched the site to death, guesstimates on how much money they can make and services that help you make it rank well. This includes a look at blogs, digg, del.icio.us, other social sites, link building tactics, directory inclusion, content writing, and more.
Posted by Barry Schwartz at 8:54 AM | Permalink
Boogybonbon.com has revealed how you can potentially de-list your competitor's site from Microsoft's search engine. In short, most sites return a 200 status header for when you go to a page like domain.com/index.html?test=test or domain.com/index.html?test=test1234, etc. You can play on that by convincing Microsoft that a particular site has hundreds or thousands of duplicate pages, and at some point, Microsoft may penalize the site with a duplicate content penalty, where they de-list your site and home page. That is the short story, if you want the long write up visit Boogybonbon.com.
Postscript: Other coverage at Threadwatch and Search Engine Watch Forums.
Posted by Barry Schwartz at 9:32 AM | Permalink
Tim Converse, the "spam fighter" at Yahoo, has a fun post he named Search engine optimization (SEO) from black to white. He tries to add nine colors between black and white. For example, a "dark gray" SEO is an SEO that "collects (aka steals) random text from other sites, and uses it to create thousands (or millions) of pages targeting particular queries. The pages have nothing original of value, but do have ads." The new shades of black and white include; Dark inky black, Charcoal, Dark gray, Slate gray, Gray, Light gray, Off-white, White, and Luminescent pearly white.
Posted by Barry Schwartz at 9:12 AM | Permalink
Threadwatch reports that the United Press International is selling links based on PageRank values. If you visit the advertising section, specifically for text links you will clearly see UPI marketing those text links to manipulate rankings and not for direct traffic building purposes like this:
The benefits they list include "increasing Page Rank" and "improving search engine results." Plus UPI listed out pages, with their current PageRank values and backlink counts for marketing reasons.
I have posted a screen capture at Flickr to document the full page it before Google removes all value from the links on this site. :-)
Posted by Barry Schwartz at 9:19 AM | Permalink
John Battelle has a short interview with Google spam fighter Matt Cutts. The most interesting part I found was news that the W3C has added a meta nofollow tag to their page with paid links, which Matt seems to say is the same as the completely different nofollow attribute and thus something acceptable for to do by those selling links who fear the wrath of Google.
Let's back up. You can put a meta robots tag on your pages with the value of "nofollow," as described here. This tag, about 10 years old now, long predates any concerns about link selling skewing search results or the nofollow attribute. It is supposed to tell a search engine not to follow any links on a page, for purposes of indexing those links.
In other words, you've got a page with 20 links leading to other pages in your web site. Put nofollow into a meta robots tag, and you're telling the search engine not to follow the links on that page to those other pages.
An important note. Just using nofollow doesn't protect those other pages from being indexed. If there's any other links pointing at them from anywhere on the web, search engines will follow through to them that way. So if you don't want them indexed, you need to make use of a meta noindex tag or robots.txt text to specifically block them.
Now on to the nofollow attribute. Created in January 2005, it was a way to flag particular links to search engines as those a site owner doesn't explicitly approve of. It was never defined as a means to telling search engines not to actually "follow" the link. It was more a way to say that you don't endorse the link. In fact, to my knowledge, Yahoo and perhaps others will still "click on" or follow links even if they make use of the nofollow attribute.
Now to the W3C. W3C Selling PageRank Or Thanking Supporters? covers how some have felt they've effectively been selling links without using the nofollow attribute that Matt Cutts in particular has urged those selling links to do, lest they potentially be penalized by Google.
In Matt's interview, we read that using nofollow in the meta robots tag might be seen as the same thing as a nofollow attribute, at least in Google's eyes. That's a completely new thing to me. I've commented on Matt's blog post about the interview, to see if he'll clarify more.
Aside from nofollow, the interview also gets into some interesting discussion of whether Google should do more to use humans in refining results.
Posted by Danny Sullivan at 7:42 AM | Permalink
Tim Converse, the web spam fighter at Yahoo Search, wrote a very interesting blog entry explaining aggregation spam. In short, aggregation spam is a form of content spam where you scour the web for matches on a specific keyword phrase, then compile a page of content with snippets and chunks of content found containing that keyword phrase and related keywords around it.
Tim offers up this extreme analogy;
Imagine that you get home one night to find a stranger leaving your house with a sack containing your TV, cell phone, jewelry. You might misunderstand, until we explain that he's actually an aggregator - he's just aggregating your belongings.Tim explains that it is hard for the search engines to draw a fine line in the sand as to what is defined as high-quality aggregation that should be included in the search engines versus those that should not be included. But one thing he personally believes is that the "the bar for inclusion ought to be pretty high."
Read Tim's personal thoughts on aggregation and search at his blog.
Posted by Barry Schwartz at 8:08 AM | Permalink
SEOMoz has some excellent examples of government sites that are susceptible to cross site (XSS) html injection, something that can also happen to any site. Let me first do my best to explain what this means in layman terms (hope I get it right).
In the examples shown at SEOMoz, they were able to add the link that looks like "<h1><a href="http://www.example.com">Look, I made a link</a></h1>" in the HTML to a new page hosted on a .gov site. Now, the page is a brand new, dynamically generated page, because the HTML itself is injected via the URL, which may look something like;
textQuery=%3Ch1%3E%3Ca+href%3D%22 http%3A%2F%2Fwww.example.com%22%3E Look%2C+I+made+a+link %3C%2Fa%3E%3C/h1%3EThe examples are still live, here is one of twenty, epa.gov link.
Now, if the search engines index this page - and they will, if there are enough links pointing to this new page, the search engines may assign higher weight to the links on this page, since it is a .gov link and thus benefit the injected links.
This exploit was first made public in mid-June. This is something that can happen to almost any site or any server. Google itself is not immune to this exploit, they suffered from it in early July. And I also had an exploit on one of the tools at rustybrick.com that people began exploiting.
I personally commend SEOMoz for posting the details on the 20 governmental sites with this exploit. They should ensure that their sites do not have this vulnerability and someone pointing this out, will help (encourage) them do something about it.
Posted by Barry Schwartz at 8:21 AM | Permalink
Threadwatch reports that Business.com has added the nofollow attribute, a method of telling search engines not to count particular links as a "vote," to many of its outbound links. Aaron Wall discusses how the use of the nofollow in this sense "muddies their credibility" by saying they have links in their directory that they don't trust. But it appears that only those that pay Business.com for a directory listing get a link without the nofollow added to it. Everyone else who is accepted into the directory, is tagged as untrusted. That's the exact opposite of how Google's Matt Cutts has said he thinks nofollow should work.
Postscript: Business.com - Use of "No Follow" Tags Explained has Business.com explaining why it uses nofollow in some cases and not in others. Postscript 2: Business.com's "No Follow" Policy Revision has Business.com changing how it uses nofollow.
Posted by Barry Schwartz at 8:41 AM | Permalink
A BBC News front-page article named Google to stay focused on search brings the issues of search spam to the public. The article explains how seventy-percent of Google's focus in on Web search and then goes into several paragraphs on how search spam is a huge issue. The article quotes Douglas Merrill, of Google engineering, saying, "Spam is an arms race," explaining that "spammers are highly motivated. There is a lot of money at stake."
Posted by Barry Schwartz at 9:38 AM | Permalink
This morning I uncovered two threads at WebmasterWorld that provide information on MSN from spam defense to when search indexes get updated. The first is named MSN Asks Webmasters, What is Spam? where MSNdude provides some insights into how MSN determines what is spam, what are junk pages and determining the "hierarchy of spam." The second is named MSN Won't Do a Search Index Update on Fridays, Saturdays or Sundays where we see MSNdude posting that normally MSN will not conduct a search index update on Saturdays and Sundays, and also they are unlikely to conduct an update on Fridays, because it may affect their weekends.
Posted by Barry Schwartz at 9:43 AM | Permalink
Thomas Bindl does what I was hoping someone would do -- make a countdown clock for when Google's Matt Cutts is returning from his vacation, spotted via Threadwatch. I've seen a number of posts in various places suggesting that Google has been having its recent spam and indexing problems because Matt's finally taken a nice, long break. Bull. Matt's great, a huge resource to Google, but the problems going on seem far more fundamental than Matt being away. If they really are due to him being gone, then Google has even bigger issues to deal with. Still, plenty of us will be happy to see him return and jump back into the search conversation.
Posted by Danny Sullivan at 10:46 AM | Permalink
Threadwatch reveals some more examples of issues Google is having. They note a search on queer forum returns CraigsList 97 times out of the top 100 results. That is not all, a search on wedding forum returns about 50 of 100 results from CraigsList's site, just scroll down to number 50 and you will see.
Is CraigsList spamming? No! Is Google suffering? :) Google is clearly having issues with sub sub domains. Continued coverage of Google's public index issues.
Postscript From Danny: Comments at Threadwatch also note Yahoo has the same issue. MSN does not as badly (but that could be the result of spidering fewer pages) and Ask looks very good.
Posted by Barry Schwartz at 8:19 AM | Permalink
I covered a DigitalPoint thread which uncovered several domains that was able to rank billions of pages at the top of the Google results within a couple of weeks. The methods deployed to rank the pages seemed to include excessive use of subdomains, cloaking, content theft scraping, alexa traffic boosting and blog comment spam. I listed the documented steps here. Some suspect that Google's new URL handling with the big daddy update allowed "old school" cloaking to begin working again.
A Threadwatch post shows screen captures of the spam and also has a comment from Google representative, Adam Lasnik. Adam directly responds to over 5 billion pages of this domain being indexed, saying:
We have noticed that some site: queries are showing bizarre results and it's turned out to be tied to a bad data push. We're fixing it now.
Yes, we are aware of the site command issues (Google's mentioned them itself). That may mean it is far less than 5 billion pages indexed in this case -- but still, plenty of pages got through.
If the site command is the issue or even if it is not, this is still indicative of other substantial problems plaguing Google that are making the rounds on discussion board and blogs lately.
Posted by Barry Schwartz at 9:09 AM | Permalink
Peter Da Vanzo has posted information on XSS Redirects & SEO. Peter linked to two documented methods of exploiting comments and links at blogs and other sites. The two links include; XSS and Redirection Attacks, which makes for a nice and interested educational read and Moveable Type Backlink Exploit that makes me a little depressed (running MovableType and all). Point being? The nofollow attribute, created to slow down link spam, has not worked, IMO, I actually had to pull comments and trackbacks completely from my blog after 3 years of them being enabled. Sad.
Posted by Barry Schwartz at 8:53 AM | Permalink
Seth Jayson has written an interesting piece "How Google is killing the internet" over at The Motley Fool. It's a lengthy analysis which takes in part its premise that web authors are so desperate to get visitors to click on their Adsense links that they're creating pages of junk without any useful content. As a result the content that is returned as the result of a search (not just on Google but on its competitors websites as well) is valueless. I'm rather ambivalent about this but the implications for search are interesting to say the least.
In common with Jayson I've run searches that return very little useful content, or almost as irritatingly, have visited a page with good data, but that which has been spread over 4 or 5 pages to maximise the number of adverts I have to look at. Despite SEO claims that the best way to get a good ranking in Google is to have really good content, some pages that rank highly in the results have got there due to dubious methods such as cloaking or link farms. The argument runs that although Google should stamp down on activities such as this there is little incentive for them to do so because Adsense brings in so much of their revenue.
Well, yes and no. Obviously Google wants to make money, but equally the only way that they will achieve this is if people continue to use their resources. If the average searcher becomes disenchanted with Google, they do have other options available to them, with Yahoo, Microsoft and Ask already trying to get rather more than a foot into the door. Although Google is constantly releasing new utilities in order to get people to use their entire raft of products, their key focus is, according to Marissa Mayer still all about search.
As a searcher what I (and everyone else) wants from a search is an authoritative answer from a trustworthy source. What any search engine needs to do is give me a good reason to visit any particular website that is returned in the results. While I trust Google to do that, the key is not to trust it too much. If the searcher can retain a skeptical viewpoint with respect to the information that is returned to them they're not going to go too far wrong. Searchers need a blended approach, combining robot powered solutions but also resources created by human beings; indexes, virtual libraries, gateways and swickis for example.
So I don't think that Google is killing the internet; that really is a statement too far. If Jayson is correct and that search results are getting clogged up at Google, it is going to have the opposite effect - the more that people are disatisfied with the results they get, the more likely they are to explore other alternative methods of getting the information that they need. Indeed, as Jayson does actually point out towards the end of his article, other search engines are constantly striving to surpass Google and there are plenty of examples where this is already happening. The limitations of Google as reflected in poor results gives greater scope for other search engines and other search solutions, which has to be a healthy situation.
Posted by Phil Bradley at 9:23 AM | Permalink
Chris Boggs over at the Search Engine Roundtable wrote an item named Which Came First: the Content or the Plagiarism? which discusses the challenge search engines face when it comes to determining the original source of a particular piece of content.
For example, the content I am writing right now may be picked up within a matter of seconds by another site that wants to "borrow" or steal the content. So now we have two (probably a lot more than two) sources with identical content. A search engine can say, hey, I found source A before I found source B with this particular content, so source A must be the original source. But if you think about that, since spiders don't work in real time, a search engine may visit the source that "borrowed" the content prior to visiting the original source of that content.
Chris offers two suggestions. The first is to watch your crawl cycles in Google and wait just before to post the content. Now that is not really feasible, as Chris knows, because there is no way to exactly know when Google will crawl your site and news information must be posted as soon as possible, so waiting is normally not an option. Chris uses this example to make a point, I believe. The other option Chris suggests is to use Google Sitemaps, so Google can see you as a trusted source and be feed the information, sooner than later.
But what do you think is the algorithmic solution? I personally do not know. There are people discussing the fundamental challenge at Search Engine Roundtable Forums.
Posted by Barry Schwartz at 9:29 AM | Permalink
Googlebowling is a term used to describe the method of knocking out a page from the Google search results. Googlebowling is conducted by linking to a particular site from sites within bad neighborhoods. Rand over at SEOMoz.org posted recent information he learned about Googlebowling while at SES London a week ago.
To successfully deploy Googlebowling, Rand writes that you need to "use patterns that would show that the site has 'participated' in [a spammy linking] program."
Specifically, this means you would point spammy links at the places the site you are targeting links to. If this is implemented properly and the site you are targeting is not a super authority, the site may be penalized for a long time. Note that the advice here is given not to encourage Googlebowling but to help people understand how it might be possible to impact their own sites.
Rand continues to explain that if a site is Googlebowled, you most likely will want to start fresh and drop the site that was penalized completely. I have discussed Googlebowling a few times at the Search Engine Roundtable. Two entries I would like to point out are:
+ Google Bowling For Dollars by Chris Boggs + Google Bowling Supporters Thread by myself
So can other people hurt your rankings? Can other links hurt you? Some think they can, but some such at Google itself say they cannot.
Posted by Barry Schwartz at 10:15 AM | Permalink
The [failure] GoogleBomb had become well-known enough to have seen Marrissa Mayer post a response on the Google company blog last September. I first heard the phrase "Reputation Management" as applied to search from Heather Lloyd-Martin during a private conversation a long time before this. It was obvious Heather was on to something because we've all seen search results that produce unexpected listings. David Dalka recently posted his frustration that Googling his name could confuse searchers into thinking he is a millionaire. This may be a personal example, but what if you have a bona-fide saboteur?
Heather recently related to me her experience with a client where a saboteur took the client company name, mixed it with adult content, and auto-generated unsavory posts published across the Web in numerous blogs and forums. Needless to say, search results for that company started looking really bad, and at times, the whole set of results was flooded with what looked like adult listings.
Heather now regularly points out examples of big brands that could use reputation management as regards their search listings. She presents screen shots at conferences showing Google queries for uhaul and victorias secret having results at number 3 and number 2 respectively that read: "UHaul made my move a miserable and stressful experience" and "Victoria's Dirty Secret."
The dirty secret site has an image with an "angel" holding a chain saw. The site makes it sound as if whole forests are regularly depleted because the cataloger lacks environmental awareness. What can you do when this happens?
You certainly have little control over the natural rankings of saboteurs unless they spam. You can easily choose to hand spammers that polute your rankings over to search engine quality assurance teams when they use tactics that would have them removed. In the case of the dirty secret site, it appears the other extreme is occurring. The campaign for environmental change at Victorias Secret may be working. Perhaps Victorias Secret will establish more earth friendly contracts with their suppliers.
Other things you can do is publish pages telling your side of the story in the hopes to get natural rankings that counteract the negative spin. You needn't wait for natural rankings to appear either, you can purchase sponsored listings to drive users to the new pages straight away. At least in the meantime your presence can be felt on those most troubling queries should they begin to affect your image in search results.
Postscript: David's personal example caused him some grief. Consider the amount of grief an "eBay Avenger" causes the young fellow who it looks like fell victim to an angry buyer that decided to make an example of him. Even if the allegations later prove to be false, and although the eBay avenger has publicly offered to take down the site, SERPs for his name will likely be damaged for a long time to come, (Google, Microsoft and Ask too).
Posted by Detlev Johnson at 4:44 AM | Permalink
The Google Sitemaps team posted to their blog in response to a question at SearchEngineWatch Seattle. Interestingly, they note that links from bad neighborhoods do not harm a site's rankings, only links to bad neighborhoods. It has long been theorized that links from bad neighborhoods do cause ranking problems and this goes against conventional thinking.
Link networks often populate quality content sites with paid text links as part of their program. If at all possible, Google obviously wouldn't want to remove quality content from their search engine. One solution is to make outbound links from quality sites that sell links worth nothing towards building rankings for destination sites.
We've heard this from Matt Cutts before: "Link-selling sites can lose their ability to give reputation (e.g. PageRank and anchortext)." If a link from such a site loses it's ability to transfer PageRank, it can make sense that it doesn't harm a site's PageRank either. But that is not a foregone conclusion. The information comes from the Sitemaps team, and not Matt Cutts' anti-spam force.
In the above entry by Matt, he recommends the use of the "nofollow" link attribute to safely purchase links purely for traffic purposes. This infers links from bad neighborhoods indeed can harm a site's rankings in Google. Perhaps Matt implies this to deter link buying, but the advice is good insofar as links from bad neighborhoods also raises the profile of sites that eventually would come under scrutiny by Google. It can also be assumed that text links from bad neighborhoods can harm a site's rankings in other major search engines than Google.
Posted by Detlev Johnson at 8:22 AM | Permalink
I reported this morning about a new tool that checks your site to see how much duplicate content like content you have throughout your site. As many of you know, duplicate content is a major issue for many SEOs today. This tool will hopefully give you the ability to catch any duplicate content issues before they become a serious issue. The tool is named Site Wide Duplicate Content Analyzer.
Posted by Barry Schwartz at 9:18 AM | Permalink
Nathan Weinberg spots a tool named Search engine spam detector. The tool looks at a particular URL and classifies what elements on the page may raise a spam flag at a search engine. So let us test it out on the SEW Blog, shall we? :)
According to the tool, this site is completely spam free. The tool hasn't found any invisible text, the tool has not detected any unnatural text, the tool hasn't found any significant keyword stuffing tendency in HTML code, and the tool hasn't found any doorway farm.
So the SEW blog is more white hat when compared to SEroundtable.com, I got invisible text.
Posted by Barry Schwartz at 8:49 AM | Permalink
This morning, I reported on a tool that allows you to check if you are banned in Google. The tool is a desktop application that searches Google using a site: command and also checks sites that link to you, to see if they are banned as well. You can check out the tool by clicking here. Keep in mind, Google also can notify you of some site penalties with Google Sitemaps.
Posted by Barry Schwartz at 9:38 AM | Permalink
SEO Black Hat reports that it appears Expedia France is spamming the search engines. What this appear to be are hosted spam pages on the expedia.fr domain name. If you do a search at Google for buy viagra you will currently notice that buyviagra.blog.expedia.fr is the 2nd result. There are many other examples of these pages, in fact, my blog has been denying comment spam from all sorts of Expedia France subdomains including homeequitylineofcredit.blog.expedia.fr. This may just be some sort of Expedia hack, where spammers buy the subdomain from Expedia, to do what they want with it.
Posted by Barry Schwartz at 10:01 AM | Permalink
BMW Ad With Google over at Google Blogoscoped gives me a chuckle for a variety of reasons. It shows an ad from BMW saying "The Search For Yourself Doesn't Run On Google." The irony! It comes after BMW was banned by Google briefly plus after Pontiac tapped into the Google brand to sell cars. Specifically:
Welcome Back To Google, BMW -- Missed You These Past Three Days covers how Google banned BMW Germany for spamming, knocking it out of the index for a few days. I'm sure the ad is just a coincidence, but it's sort of funny to see a pseudo-slam against Google following on this.
TV Commercial "Googles" Pontiac covers how Pontiac embraced the Google name, with Google's permission, to help push its cars by tapping into the Google brand positively. Now here's BMW using the Google brand I'd say negatively to push its motorcycles. And you've got to wonder if they got (or needed) permission to use the Google name (it's only the name used, not the trademarked logo).
Posted by Danny Sullivan at 7:12 AM | Permalink
I just came out of the Meet the Crawlers session, where Google announced new features and a new layout for Google Sitemaps. The Sitemaps blog just posted the details as well. One huge feature is that Google tells you if your site is in the index or not and if it is not, they won't tell you why.
Here is a break down of the new features:
+ New verification method + Indexing snapshot + Notification of violations of the webmaster guidelines + Reinclusion request form + Spam report + New webmaster help center + More about our new look + Adding a Sitemap + Navigating the tabs
Full feature list at sitemaps blog.
Postscript: Matt Cutts just pinged me to let me know he has posted an entry named Notifying webmasters of penalties. That entry explains that the Google Web Search Team and Google Sitemap Team working together to notify "some (but not all)" webmasters of Google site penalties.
Posted by Barry Schwartz at 1:59 PM | Permalink
PubCon has been happening out in Boston, while Search Engine Strategies is going in Japan. Here's a round-up to some coverage on search-related sessions:
Want to comment or discuss? Visit our SEW Forums thread, PubCon Boston 2006.
Posted by Danny Sullivan at 8:35 AM | Permalink
A bit of catch-up, Aaron Wall of SEO Book notes that the case against him filed by Traffic-Power.com was tossed out of court on jurisdiction issues. Traffic-Power has 30 days to appeal, but Aaron's hopeful this means the case is over. The case against Traffic Power Sucks has yet to be resolved, he also notes. For background on the Traffic-Power suits against both TrafficPowerSucks and SEO Book's Aaron Wall, see these past posts:
Postscript: Actually, Aaron writes to clarify the appeal time has already passed. The case was tossed out on February 13, so the 30 period for appeal has elapsed.
Posted by Danny Sullivan at 8:44 AM | Permalink
Matt Cutts of Google posted a funny entry where he notes that he will be on the program committee for AIRWeb, Adversarial Information Retrieval on the Web, this year. He sarcastically asks search spammers to submit their tricks and ideas on how to spam search engines. If you really want to submit your techniques, the call for papers can be found here.
Posted by Barry Schwartz at 8:50 AM | Permalink
There's still the occasional person who I encounter who thinks that SEO overall is somehow wrong to do or something the search engines frown upon. Yahoo!, MSN & Ebay recruiting - SEO hits the big time is an example of why this isn't so. It covers how Yahoo, MSN and eBay in the UK are all recruiting internal SEO people to help promote their own sites.
Such hirings aren't new. We've long had search companies themselves trying to rank well in other search engines, to the point of hiring people internally or externally to make it happen. But it's a nice reminder for everyone to keep in mind.
Personally, I got a chuckle out of the breakdown Threadwatch did of the MSN UK recruitment ad. Wanted: Spammer-in-chief for MSN over there highlights some of these key success metrics for MSN UK's SEO person:
As for Yahoo, I found these points interesting:
Note the part I bolded. Nice to see that Yahoo UK wants to ensure no one suddenly accuses it of spamming itself or another search engine. Nah, such things never happen. Wait a minute: Google Admits To Cloaking; Bans Itself. That was from last year, but to be fair, it was pretty much an accidental thing.
Posted by Danny Sullivan at 9:30 AM | Permalink
Bill Slawski has an excellent write up on web spam through the eyes of patent applications and published papers. During Bill's research, he found PageTurner by Microsoft, which not only looks at how to establish a crawl frequency of specific Web pages, but also identifies "duplicate and near duplicate content on web pages." From one of the papers Bill referenced in the post, he notes the usage of the words "crafty porn." That leads him to a patent application we referenced last week named content evaluation by Microsoft. Anyway, Bill really digs deep into these algorithms and patent applications with links and abstracts pulled of content and video presentations. Read the full blog entry entitled Fighting web spam with algorithms.
Posted by Barry Schwartz at 9:09 AM | Permalink
ClickZ links to a great undercover project by the Wall Street Journal named Our Columnist Creates Web 'Original Content' But Is in for a Surprise. The article is written by a columnist that went under cover, and was hired by Web "publishers" that want so-called "original content" for ranking well in search engines. The writer explains how he was hired to write 50 articles, each 500 words long for a total sum of $100. In the end, the "publisher" wanted plagiarized copy for his 50 articles.
To make a long story short, he spent days researching and writing one article, sent it to the client, who said it was written well, but wanted to break up his original article into smaller more keyword phrase specific articles. The client sent an example of an article to the WSJ columnist, who noticed that it was plagiarized not from one site, but several well-respect sites, including World Health Organization, New Scientist and WebMD sites. The client wanted the columnist to flip around from site to site and copy pieces of content from popular sites, and paste them together to make "original content."
All in all, he blames the search engines for allowing this. He provides the following analogy; "In fact, search engines are more like a TV camera crew let loose in the middle of a crowd of rowdy fans after a game. Seeing the camera, everyone acts boorishly and jostles to get in front. The act of observing something changes it."
To be fair, I just wrote an article this morning at the Search Engine Roundtable named Writing Articles That Get Links. I explain in that article that the copy-writing for search engines is getting old and will eventually be figured out by the search engines. For articles, today, to get links, to rank well, they must be written with soul and emotion. You have to care about what you are writing for people to want to link to them. You see these patterns happening at Google today. Of course there are hundreds of examples of pages that catch the long-tail of search terms that are plagiarized - but you will notice (1) less and less of this in the future (2) and/or longer-tailed keywords being targeted with these articles. Both of which reduce the likelihood of a searcher locating such articles.
Posted by Barry Schwartz at 9:39 AM | Permalink
Bouncing ball time. Last April, I wrote about how the Stanford Daily newspaper was selling links for those seeking to rank better on Google, ironic given that Google was born out of Stanford University and is very anti-link selling. Then last May, the newspaper decided to abandon paid links along with doorway pages it hosted for third parties. Today, SEW Forums moderator AussieWebmaster notices that paid links and hosted web pages have come back, such as you'll see at the bottom of the paper's home page here and a hosted page here.
Nope, I don't see the use of nofollow as Google's Matt Cutts recommends, nor is the page banned by robots.txt from being indexed. Far from it, it's ranking well.
AussieWebmaster -- Frank Watson -- oversees a site for currency trading terms, which is why the Stanford-hosted page came to his attention. It currently ranks fifth out of about 40 million pages that Google has indexed for the term forex.
Well heck, at least the page carries Google's own AdSense ads on it :)
Want to comment or discuss? Visit our Search Engine Watch Forums thread, Paid Links, Hosted Doorway Pages Back At Stanford Daily.
Posted by Danny Sullivan at 2:34 PM | Permalink
Rand at SEOmoz shows his frustration with Yahoo paid reinclusion. He tells a story of a client who hired him to clean up his site after being banned from Yahoo. Rand's team did just that and after using paid Sitematch program for reinclusion, the site was denied. Rand posts his conversation with a Yahoo representative, where he shows that even though the site is cleaned up, Yahoo has it on a list that doesn't allow it to be reincluded. I tend to see these posts and complaints arise weekly in various SEO forums, so this is far from a one case situation.
They won't tell Rand if there are issues with the current site, all they can say is that the site does not "meet our [Yahoo] quality guidelines requirements." For the full effect of the "absurdity" in the conversation, read the entry.
So what does one do if they are in Rand's position? First try following the reinclusion tips Danny worked up back in June of 2005. If that fails, I have reported that a fairly unknown Yahoo Second Review Request form works wonders in getting sites reincluded into Yahoo.
Posted by Barry Schwartz at 11:08 AM | Permalink
Google's Matt Cutts has provided official confirmation of a ban on the Traffic-Power domain name and some Traffic-Power client sites. Matt writes about how Google hasn't usually confirmed or denied if a company has been banned in the past, but it's a policy now changing in cases where Google finds it useful to help educate site owners and others. As for Traffic-Power, Matt wrote:
I can confirm that Google has removed traffic-power.com and domains promoted by Traffic Power from our index because of search engine optimization techniques that violated our webmaster guidelines at http://www.google.com/webmasters/guidelines.html.
Matt's post -- which he notes was reviewed by Google's lawyers -- was in reaction to a recent court filing in the case of Traffic-Power versus TrafficPowerSucks. As Threadwatch notes, the filing by Traffic-Power alleges that TrafficPowerSucks has made false and defamatory claims including:
a. Claims that the search engine giant Google has banned and is banning from its search engine listings websites of Traffic-Power.com clients because of the search engine optimization strategies used by Plaintiff.
b. Claims that clients of Traffic-Power.com run the risk of being banned from Google search engine listings if they use Traffic-Power.com services
Fair to say, TrafficPowerSucks now has some pretty powerful evidence to refute the Traffic-Power allegations.
For background on the Traffic-Power suits against both TrafficPowerSucks and SEO Book's Aaron Wall, see these past posts:
Want to comment or discuss? Visit our SEW Forums thread, Traffic Power Files Suit Against SEO Book.
Posted by Barry Schwartz at 9:11 AM | Permalink
I said BMW would be back soon after they got banned on Saturday. Matt Cutts over at Google lets everyone know they are now back in. So, they got a three day slap on the wrist. It demonstrates once again how public spam reports can be so effective and how big major web sites really don't get the "death penalty," when it comes to spamming.
Spam always seems to get removed faster after a big dose of publicity. Back in 2003, I wrote Google Kills eBay Affiliate Spam Quickly, Others Survive for Search Engine Watch members that looked at how an eBay affiliate using doorway pages was quickly removed by Google after public exposure. In contrast, people still complain that nothing happens when they file spam reports with major search engines through official spam reporting feedback forms.
BMW's situation proves once again that the best spam antibiotic is a good topical application of publicity. So did you spot spam? Blog away. Get others to blog, and that will probably help get the spam removed.
Are you spamming? If you're not hiding your tracks well, be forewarned that the publicity monster might roll over you at some point. On the flipside, we'll eventually have so many public spam reports that not all of them will be dealt with.
For example, More European Automaker Sites Do Doorways & Should Search Engines Be Able To Enforce Spam Rules? on the blog from yesterday covered spamming spotted by Porsche Denmark and Chevrolet Sweden, but those two automakers remain listed. I expect they probably will remain listed, too. If BMW took a ding for being banned, Google took some hits from those who feel spam removals ought to happen after a warning. Google's probably thinking about ramping up the spam notification program it was testing before wiping out any more big time sites that might push back on no warning wipeouts.
Meanwhile, a second spam truism gets proven. Big companies hardly face a "death penalty" on Google. They get back in and fast. Let's do some timings. In the Spam Olympics event of getting back in after being banned, we have....
What if you aren't a big company? Matt covered the timeline on getting back into Google in his prior Filing a reinclusion request post.
How long do you have to wait now? That depends on when Google reviews the request and on the type of spam penalty you have. In the days of monthly index updates it could take 6-8 weeks for a site to be reincluded after a site was approved, and the severest spam penalties can take that long to clear out after an approval. For less severe stuff like hidden text, it may only take 2-3 weeks, depending on when someone looks at the request and if the request is approved.
So while BMW was upset that Google didn't give them a heads-up about being banned, at least they didn't have to wait 2-3 weeks to get back in. Over at Matt's blog post, you can see some of people commenting who aren't happy with such express service. Matt responds:
Our main goal has to be to give the most relevant results to our users; there is currently a trade-off between taking action to remove spam from our index vs. removing sites that lots of users look for with navigational queries.
That brings me back to the advice I've long given to those thinking of skirting search engine guidelines. How big do you think you are? If you really think you're running a crucial site, you can sin against Google and gang and probably be forgiven in short order. They do need you. Absolution will be provided. Maybe put you back in so that you don't rank well for generic searches, but you'll be back in and find for navigational ones.
Running some small web site that no one's going to miss? Don't expect express treatment nor gamble you'll be reincluded.
Meanwhile, Barry points to a WebmasterWorld thread finding that the same thing that got BMW banned is still happening. Well, not quite. As Philipp at Google Blogoscoped points out, the pages are gone from the live site but Google is still retaining cached copies of them. Those cached pages should be dropped over time.
Want to comment or discuss? Please do! Visit our Search Engine Watch Forums threads Google Removes BMW Germany For Spamming or BMW debacle good for SEO?
Posted by Danny Sullivan at 10:40 AM | Permalink
Dave Naylor's been doing a tour of European automotive sites and finding others that are doing the doorway page dance that got BMW banned from Google. Meanwhile, there's some concern in the blogosphere about whether people should be worried about Google's spam rules in general. A look at both issues, below.
Dave's found this page over at Porsche Denmark that redirects to the Porsche Denmark home page. Disable JavaScript (use this handy tool for Firefox), and you can see the underlying textual content that's being cloaked.
It's hard to know what exactly is going on, as I don't read Danish. Since you can't get to this page from the Porsche Denmark home page -- and since it redirects to that home page -- it seems designed mainly to capture searchers looking for a particular topic and route them into Porsche. In other words, a classic doorway page operation.
Here's a better example. Look for klassiske porscher on Google, then you get this page, which redirects to the home page. Disable JavaScript, and the redirection stops, showing you the hidden content. A user never sees that. Porsche has no intention for them to see it. They only want Google to see it, to rank the page well and deliver them a user to a completely different page on the site.
In the comments on Dave's post, David Thulin points to this page at Chevrolet Sweden. Use that tool I mentioned above and disable styles. Now the pretty picture of a Chevy goes away, replaced by hidden text. My Swedish is as good as my Danish -- ie, I can't read this. But it doesn't seem spammy in terms of repetition. Still, scroll to the bottom, and you'll see links to additional doorway pages. Someone clearly realizes search engines don't like the graphical pages they are feeding out, so they've created a series of doorway pages. That degree of savvyness also means they should be aware that search engines generally don't like doorways.
Of course, the entire BMW situation has sparked some interesting pushback in new quarters, people who feel like Google in particular shouldn't be pushing "orthodoxy" or their own results on site designers. Google Orwellian at Publishing 2.0 is one example (I left some comments there), Death Penalty, Investigations? Sounds like the FBI... is another and Google Delists BMW-Germany at Slashdot has some similar comments. Jeremy Zawodny has some pushback of his own on the pushback over here: Google vs. BMW, a sanity check.
I think some of the outcry is mistaken. Google is simply doing what all search engines do, enforcing its own rules on what spam is. That's not anything new or Google specific. Sure, it does warrant examination. Then again, it has also been heavily debated in the past. Not everyone agrees with spam rules, but even those who don't understand that if they do something against the rules, they risk getting tossed out. But perhaps the times are a changing...
For those looking to educate themselves on spam issues, here's a reading list:
Need yet more? The SEO: Cloaking and SEO: Spamming categories of the Search Topics area available to Search Engine Watch members takes you back for years with articles on these topics. Plus, becoming a member helps support the site and the creation of content like you're reading right now.
Want to comment or discuss? Please visit our Search Engine Watch Forums thread, Google Removes BMW Germany For Spamming.
Posted by Danny Sullivan at 9:32 AM | Permalink
Last week, I wrote about BMW Germany being spotted spamming search engines. Google's Matt Cutts posted on Saturday that the site is now out of Google -- and the Ricoh Germany would also be removed for spamming (it's out now). The move has sparked what I'd call unprecedented coverage by mainstream publications on a spam removal (BBC, Forbes, London Times, Financial Times, Sydney Morning Herald).
The Financial Times and Forbes articles are especially worth reading. BMW criticizes Google for not contacting it first in the Financial Times:
"Google has decided to spread this information which has created this, I'd almost say, media hype," [BMW] said. "They spread it on Saturday, a few days after the pages had been taken off. They hadn't talked to us beforehand which we found a bit surprising."
Hey, how about not having allowed it in the first place? If it wasn't out for the public to see and discuss, BMW wouldn't have an issue.
No doubt smaller companies and individual webmasters will be hearted by the fact that even big companies like BMW can get banned at Google. The reality is, however, that they'll be back in soon. Most big companies that get banned are put back in quickly because searchers expect to find them for navigational queries. I cover that more in our Search Engine Watch Forums thread, Google Removes BMW Germany For Spamming.
Whether BMW will now face a public relations black eye for having spammed Google remains to be seen. I kind of doubt it, but I'm pretty cynical about these things.
Meanwhile, Dave Naylor spots BMW France apparently has spam issues, as well. Somehow, I suspect the talks that Google and BMW are having right now means they might escape the axe if the spammy stuff is quickly removed.
Finally, just a chuckle. Matt's reported by Forbes as being "a blogger who purports to work for Google," while the London Times calls him "a blogger claiming to be a Google software engineer."
Posted by Danny Sullivan at 3:31 PM | Permalink
How about a little old skool search spam? OK, Philipp over at Google Blogoscoped points out in BMW's Doorway Pages that the automaker is employing hidden content on its web site in Germany. Search engines see one thing; humans another. It's cloaking, but not IP based, more of the "poor man's" variety that uses JavaScript.
Look at this page. You'll see nice, pretty pictures of BMWs. Philipp then illustrates how when you turn off JavaScript, you get a page of completely different content, including the use of the word "used car" 42 times in what appears to be a gibberish doorway page. That's a page where the sentences may look like they are saying something to a search algorithm but make no sense to a human reader.
When I disable JavaScript, I actually don't see what Philipp got. Instead, I get a page telling me that I need to have JavaScript to view the site. Part of this seems to be due to some redirection going on. But here's an easy way to see what Google's actually being shown -- this page from the Google cache.
Posted by Danny Sullivan at 12:19 PM | Permalink
It's easy to assume that everyone knows what a scraper site is. Everyone doesn't -- or at least, they know what a scraper site is, they just don't know what they are commonly called. Scraper Sites and SE Ambiguity: What is Your Sites Reading Level? from Stuntdubl gives you a nice rundown on how scrapers grab search results to make "content" that's typically host to Google AdSense ads -- and asks the same question on the minds of many, why does Google fund this junk?
Posted by Danny Sullivan at 10:04 AM | Permalink
Google Fights Paid Links & Yahoo Defends Paid Links from Barry over at Search Engine Roundtable does a great job of recapping the ironic situation of Yahoo blogvangelist Jeremy Zawodny selling links on his personal blog without using nofollow attributes while the most direct counterpart he has at Google, Matt Cutts, has been urging for months that nofollow should be used on paid links.
While Barry's done the recap, I still wanted to revisit things myself. First, there's the nofollow attribute, which was introduced earlier this year primarily as a way for blog owners to help combat trackback and comment spam. Slap a nofollow on links in these areas, and they don't pass credit for the search engines that support nofollow.
Todd Friesen dubbed nofollow to be a "link condom" (see Link Condom: The Nofollow Parody), a way to interact through links with other sites safely but not actually touching them, at least as Google, Yahoo and MSN will view it. But far from a joke, I later wrote a follow up on how the link condom parody site was a good jumping off point on how nofollow had many other uses, including as a means for those selling links to tell search engines that they meant no harm.
This was a point I made in my original write up on nofollow, Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links. Nevertheless, the issue of using nofollow in relation to paid links really exploded when O'Reilly was found to be selling links in August.
O'Reilly In Debate Over Link Selling covers that situation as well as the issue of selling links and influencing search engines by buying them in much greater depth. Nofollow was a solution, I explained, for any publisher not wanting to be accused of doing something wrong.
Soon after, in Text links And PageRank, Google's Matt Cutts urged the use of nofollow as a safe way for people to buy links, along with a warning that sites selling links without doing this might not pass along PageRank.
Not everyone agrees with Matt, as you can see in the comments below his post or in this discussion over at our Search Engine Watch Forums: Google's Matt Cutts On Link Selling: Sites Might Not Pass Reputation; Buyers Might Get Targeted More.
Now skip forward to last week. Zawodny Says No to Link Condoms from Greg Boser covers how he received an email telling him that he could now buy links on Jeremy's site (Dave Naylor got a similar email) and the irony that those links don't make use of link condom in the way that Google would prefer, as would likely Jeremy's employer Yahoo, as well. Or maybe not Yahoo, given that as some of the articles above detail, it has come under accusations that its $300 per year Yahoo Directory is nothing more than a giant link selling network.
I was actually going to drop a note to Jeremy and Matt to get both of their views on all of this before posting, but Sponsored Links from Jeremy over at his blog saves part of that work. In it, he explains his viewpoint on not using nofollow. To stress his main points:
Those are all fine points, but none of them except the last are likely to make Matt over at Google happy. I'll try to channel him as well as comments on paid link in relation to the impact on search relevancy:
Further down, Jeremy talks about no one up in arms about the Google AdSense links he carries. Yep, and O'Reilly In Debate Over Link Selling has me covering exactly this same point, when it came up at one of our conferences. AdSense are sponsored links -- they're just "safe" sponsored links in terms of search relevancy that Google doesn't mind.
Need more from Matt directly? SES Chicago 2005 I think has some fresh comments from him below the post on link selling, as does ?Tell me about your backlinks?.
What's going to happen to Jeremy? As Greg notes, he's not going to be yanked from Google. His site is far too important for that. But Google might prevent it from passing along link juice to others. Apparently, I'm told by others (not Google itself) that Google's done the same to Search Engine Watch because of our SEW Marketplace ads that we sell.
If so, Google's just stupid. If it can't figure out that we carry the same sponsored links in the same area and filter out that part, really -- they're dumb. They're even dumber if they have to wipe out the ability of an entire site to help influence its results in a good way. We link to many excellent things -- including things Google wants people to know about. Our links don't carry weight because Google's not smart enough? And Jeremy's site might not carry weight as well? Please.
If you're interested, that O'Reilly In Debate Over Link Selling covers the former paid link program we had here and how ultimately, the SEW Marketplace ads might move to using nofollow down the line. But since none of these were ever sold as ways to help people rank, it's kind of a pain to have to retroactively make that type of move.
Want to comment or discuss? Visit our forum thread, Yahoo's Zawodny In Paid Links & Nofollow Debate.
Postscript: Matt adds his thoughts on the situation over in Text link follow-up which in summary says yes, he still thinks nofollow is the way to go, but Jeremy's free to do as he likes on his site, just as search engines will be free to decide what sites they want to trust based on linkage patterns. But it's more fun to read his actual post, especially because he plays a game of Six Degrees, getting from a paid link to a sex positions web site in two mouse clicks.
Posted by Danny Sullivan at 10:31 AM | Permalink
Greg Boser in The Truth About Reciprocal Link Networks on his new blog looks at how a former client built up 7,000 links in a month and a half, skyrocketing activity that may have blown up for the person in the latest Oct. 2005 Google Jagger update. He also digs into the GotLinks link network that the person was involved with and comes away unimpressed. He also raises the worrisome issue that that links out of your control could potentially harm you.
Posted by Danny Sullivan at 7:21 AM | Permalink
WordPress Spam Scam Explained is an undated article giving the Hot Nacho side of the Wordpress spam saga from owner Chad Jones. It might not be new, but I just heard about it via Aaron's SEO Book blog. It highlights how while WordPress was back in Google within a day, HotNacho and other sites owned by Jones remain banned.
That's the biggest takeaway, showing exactly what I said in my article about the WordPress case. If a site is important enough, search engines simply cannot ban it despite spamming issues because it will hurt relevancy. If you type in [wordpress], then you want to find the WordPress site. But if you're a nobody site, look out -- spam a search engine, and there's no particular reason for them to let you back in.
Of course, there are going to be people searching for HotNacho on Google and not finding it because of the ban, and that's actually bad relevancy and somewhat troublesome overall for a service that's supposed to be helping organize the world's information.
It's one thing to ban a site for ranking well on non-navigational terms. But if I type [hotnacho] or [hot nacho] into Google, I really ought to be able to find that site as being relevant for that navigational query. Right now, it doesn't come up. It doesn't come up at Yahoo, either, which wasn't impressed with the HotNacho software.
Yeah, it's sucky to include a link at all to something that you feel like is undermining the quality of your service (and sorry Chad, the pages I saw were sucky and being semi-automated rather than fully-automated in creation doesn't somehow make them better). But for some perspective, what do you think is worse, HotNacho or Nazis? Go search for [nazi] on Google, and it will happily send you off to the American Nazi party. But HotNacho? Oh, no -- now that would be evil.
Postscript: Be sure to see Greg Boser's funny observations here, as well.
Posted by Danny Sullivan at 10:27 AM | Permalink
Here's some great reading, viewing and material for your reference shelf. With web spam continuing to be a very hot topic, I wanted to point out two papers and the slide presentations that accompany each of them. Both do a great job of describing web spam issues and making them understandable. Both papers were given during the 14th International World Wide Web Conference and AIRWeb05 that took place in May.
Title: Web Spam Taxonomy (9 pages; PDF) Authors: Zoltán Gyöngyi, Hector Garcia-Molina.
Abstract: Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures.
Slide Presentation Here 23 slides; PDF.
Title: Web Spam, Propaganda and Trust (9 pages; PDF) Authors: Panagiotis Takis Metaxas, Joe DeStefano
Abstract: Web spamming, the practice of introducing artificial text and links into web pages to affect the results of searches, has been recognized as a major problem for search engines. It is also a serious problem for users because they are not aware of it and they tend to confuse trusting the search engine with trusting the results of a search.
Slide Presentation Here 27 slides; PDF.
Posted by Gary Price at 11:06 PM | Permalink
Moving To Trusted Links & Change The Link Election ModelThank you, Aaron. That's for taking the research paper (PDF file) about detecting link spam that Gary wrote about earlier and breaking it down in non technical language (and Jim Boykin summarizes Aaron further here). Aaron finds things like the paper says having .edu and .gov links are a good thing, don't worry about having a few spammy links and the more trusted links you have, the better.
I was thinking last night about the way to describe some of the changes or generational evolution we've seeing with counting links, and I thought it might be helpful to break it down this way:
Counting Links / Referendum: Before Google, other search engines made use of links to determine which sites might be important. But this was mainly a counting exercise. The more links the better, regardless of the quality of those links.
Weighted Links / Electing A Congress: Google's PageRank system helped usher in a change to weighting links, that not all are as important as others. The system worked to figure out what were the most important links and give sites getting those links more credit -- the authority pages, to use a popular term for this.
Trusted Links / Qualifying Representatives
This is what we've been moving to. When PageRank knew a link was "important," that wasn't the same as trustworthy despite the authority misnomer. It only meant knowing that some particular link should count for more because the page the link was on had a lot of people "voting" for it. You can scam that type of voting.
That's a rough idea, and I'm playing at refining it more, but I thought I'd share it now. Remember, it's also not just about how much a link counts for but the text that's in and surrounding the link, along with a lot of other factors. Also see Yahoo My Web: An eBay For Knowledge on how search engines hope to tap into trust in ways beyond link analysis to improve results.
Posted by Danny Sullivan at 10:25 AM | Permalink
Black Hats Going White?A reporter asked me recently if the black hat and white hat branches of SEO are getting further apart these days. I replied I thought things were coming more together.
More white hats seem to feel things they might have deemed wrong in the past to be more acceptable, while some black hats are deciding some aggressive tactics might not be worth continuing with. Meanwhile, "bad" techniques like cloaking suddenly don't seem so black hat when Google itself fully cooperates with some sites to allow it. The world of SEO just getting more gray, to me.
A Whiter Shade of Black from Gord Hotchkiss over at MediaPost is a good piece on this, looking both at how white hats can enjoy the "guilty pleasure" to talking with "these dark magicians" but how his dark hat dinner companion conversely found things getting harder and wanting to go "legit."
One point of dispute. While Gord feels the Nov. 2003 Google Florida Update was the biggest blow to spam and dark hats, I have the exact opposite view. In the wake of Florida, many, many people I talked with and read commenting on forums felt like they had been trying to go the good content route.
When Florida hit and Google stayed quiet about the mystery "signals" in place, I felt like that made it an open season for some people to feel like "anything goes" with Google, not less. Just my take.
Want to comment or discuss? Visit our forum thread, Is SEO Getting More Gray?
Posted by Danny Sullivan at 8:57 AM | Permalink
We've been having quite a discussion about reciprocal links over at the Search Engine Watch Forums in our Reciprocal Linking Dead or Alive? thread. I chimed in to stress that whether a reciprocal link is bad or good can also depend on what exactly you mean by "reciprocal link." From one page to another and back? From one site to another but between different pages? And what about the underlying reason for the link? For search ranking purposes or for your visitors? Today, a gift (ahem) from the email deity arrived in my inbox. An example of a bad reciprocal link plus a bad link request. Yum, double badness to blog. Let's look.
Here's the email. URLs have been broken so as not to benefit the guilty, but you can always cut-and-paste and piece together, if you're really curious.
Dear Webmaster,
We would like to add http://forums.searchenginewatch.com/ to our online directory, by placing a link to it in our site http:// www. bradandjennifer. com/ In return; we would like you to link back to http:// www. myweddingfavors. com our site.
This exchange will create one way links to both our sites, which is beneficial from SEO point of view. This link will remain active as long as the requested link back is active on your site.
Please mail us your link Title, URL & Description & we will immediately place a link to http://forums.searchenginewatch.com/ on http:// www. bradandjennifer. com/ your link shall appear at: http:// www. bradandjennifer.com/ links.htm
Please place a link back to http:// www. myweddingfavors. com using the information below: -------------------------------------------- Link Text: Wedding Favors Description: Elegant wedding favors and unique wedding favor ideas from My Wedding Favors. URL: http:// www. myweddingfavors. com ---------------------------------------------
Thank you for helping both our sites achieve higher rankings, and for becoming part of the http://www. bradandjennifer. com/ family!
Regards, Satyajit
Want to get 10,000 unique visitors per day from organic search engine traffic, like our Yahoo Store?
"DISCLAIMER: If you prefer I not send you future emails, please reply with the word 'REMOVE' in the subject line."
Oh, where to begin? Let's go with the biggest reason why I think this is a bad reciprocal link. That's because there's no benefit to my visitors in adding it. Do they care about this Brad & Jennifer site (and no, it's not that Brad & Jennifer). They do not! They care about search stuff. If I link to this site and they link to me, sure, we've scratched each others' backs. But that's to benefit each other, to reciprocate, not to help our visitors.
Hey -- what's the deal with those blogrolls you see in a lot of places, even on our blog. Isn't that reciprocation? Sure is! Move along, you joker. You're messing up my presentation.
Seriously -- yes, it's reciprocation, but reciprocation that also helps your visitors. There's some good reason beyond "I wanna top search ranking" for doing those links.
Now what if we go with this old school reciprocal link request, defying all better judgment. Is that enough to get us banned? Yes. Yes, you will be banned for life. Using mental telepathy, I just beamed that question to all my search engine contacts and received back that unified answer, despite the tinfoil on my head.
Nah. What's one link among many? Honestly, even if I end up doing a fair number of these, I'm still not likely to get banned. But neither are they likely to be doing much good for me.
This is because the links coming at me are likely on pages with a lot of other links -- and links that clearly aren't related to each other -- and thus making it easier to identify that page as one that perhaps shouldn't be able to transmit much importance to other sites. (Want the science bit? Go read the paper about detecting link spam that Gary posted about yesterday. That's full of just one method of knowing good links from bad. Jim summarizes it here, from Aaron's longer summary. Both are better reads for most people).
Specifically to this example, while they want me to link to their home page, they're going to place my link on their links page, not their home page. If you look at that links page, you'll see it's just a jumble of links. A link to a Bahama vacations site next to one for a satellite TV site which is near a Utah homes for sale site.
The Google Toolbar PageRank meter gives the page a big ole whopping 0. That's zero, with a capital Zed. (That's Zed, in what my kids think Z is called. It's because of their mom/mum, who's British. And I ask you, if Z is Zed, then why don't Brits call a zebra a zedbra? But I digress badly, madly). Think you're going to benefit much being linked from that page? Then you'd better go claim that $400 Google's giving away (They're not! It's a scam!).
By the way, notice how the email is a bit confused, even more confused than my critique of it. It starts out about the Brad & Jen site (remember, not THAT Brad & Jen and not even an incredible simulation), but then it starts talking about a wedding favors site. Just a guess, but me thinks someone didn't do the cut and paste right before sending out this bulk email. Another sign of a quality request.
What have we learned from all this?
You might also take a look at my post from last year, Thanks For Your Horrible Link Request. In that, I examine not the technical quality of the link request but the style and substance of the request itself -- or lack thereof.
Want to commiserate? Visit our forum thread, Reciprocal Linking ? Dead or Alive?
Posted by Danny Sullivan at 5:26 PM | Permalink
Matt Cutts Banned On Google? And Oct. 2005 Jagger Update Winds DownThe Oct. 2005 Google Jagger update saga that has sucked the life out of so many (but not all; some are blissfully unimpacted by it) seems to be ending. Indeed, so says Google's Matt Cutts in his Jagger winding down post. But Matt, if the update is over and bugs worked out, why's your blog banned on Google?
The article I just posted for Search Engine Watch members (go on, support the site - become a member and get to read the full story) goes into detail about the situation, but here are the highlights for everyone.
Also, by winding down, that doesn't mean winding down on Google itself. Matt's post wrote that you'd find it in action if you went to the http://66.102.9.104/ data center. Over time -- the coming days -- changes will migrate to all the Google data centers.
In some related notes, Barry at Search Engine Roundtable points to Update Saga. Part zillion over at WebmasterWorld, where people are wondering if the update has come to an end. It also notes that GoogleGuy has warned of a PageRank/backlink update to happen between now and the end of the year.
Thoughts on Jagger: Recips Got Hammered, Trust Trumps All from Andy Hagans at the Link Building Blog is a nice, short piece summing up what he things were the two major changes in the update.
First, reciprocal links don't see to work as well (What are they? Want to discuss? check out this SEW Forum thread: Reciprocal Linking ? Dead or Alive?). Second, sites with authority/TrustRank seem to do better (What's that? Check out Yahoo My Web: An eBay For Knowledge).
Want to discuss or comment? Visit our SEW Forum thread, Oct. 2005 Google Update "Jagger". C'mon by Matt -- tell us what's going on :)
Posted by Danny Sullivan at 10:04 AM | Permalink
Here's a new 21 page (pdf) technical research paper from the Stanford InfoLab that takes a look at link spam. It might be of interest to some of you.
Title: Link Spam Detection Based on Mass Estimation Authors: Zoltan Gyongyi (Stanford), Pavel Berkhin (Yahoo), Hector Garcia-Molina (Stanford), Jan Pedersen (Yahoo)
Abstract: Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page's ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavy-weight link spamming.
Want to discuss? Check out this thread in the SEW Forums.
Posted by Gary Price at 2:48 PM | Permalink
How the wheel turns. Back in the 1990s, portals gave away free home pages and with them came a huge amount of search engine spam. Today, portals like Google, Yahoo, MSN and AOL give away free blog space -- and lo and behold, we have blog spam that apparently hit a new high with a blog spam emergency this weekend, as Tim Bray writes. The blogosphere has been buzzing with discussion on the problem.
As for myself, I just continue to shake my head that these type of spam issues with blogs simply weren't expected. The solution? It's likely going to be just like what happened with free web space -- free blog space will get ignored by search engines.
Come along, and we'll do a tour of past and present, plus a look at the issues you get when you try to maintain quality when also ranking search results by time.
1997: Free Home Pages & Spam
First the past. It was around 1997 when free web space for personal home pages seemed to become more accessible to many people. I remember it well, because soon after, I started getting complaints from those making use of these services. Search engines weren't finding all of their pages, they found. Some discovered that none of their pages got indexed at all.
It got so bad that eventually, I had to do an article about it. Search Engines And Free Web Pages still floats around in the SEW Archives area for our SEW members. Here's the top to it, which will sound really familiar when I start talking about blog spam later on:
Many people take advantage of free web space provided by their internet access providers. What they don't realize is that search engines have shown a tendency to miss or even ignore certain sites. Complaints have been heard from those using space provided by America Online, CompuServe and other places.
Indeed, AltaVista no longer even accepts submissions from Tripod, a popular web service that provides free space. Why? Search engine spammers were using free space there as a base of operations. It's easy to open up a new account, hit the search engine with bogus pages, then move on once the spamming attempt is detected.
At that time, it was more the internet access providers giving away space, rather than portals. But not long after, portals jumped in themselves, culminating I'd say with Yahoo's $3.6 billion acquisition of GeoCities in 1999.
Search spam hosted on free web space had died down as a problem by that point, however. Why? It was both because search engines were largely ignoring these areas of the web and because these areas were ignored, they no longer were attractive magnets to spammers. No one wants space that can't be seen.
2005: Free Blogs & Splogs
Now let's skip ahead to today. Well, more specifically to yesterday, when Mark Cuban who backs blog search engine IceRocket wrote in his Get Your Blogspot Shit Together Google post:
The blogosphere was hit by a blogspot.com splogbomb. Someone did the inevitable and wrote a script that created blog after blog and post after post.
I'm not talking 100 blogs with a 100 posts each. Im talking what could easily turn into 10s of THOUSANDS of blogs pinging out millions of posts!
Do a search for HDNet on Icerocket.com or any of the other engines and look at all the Splogs there are. And they have URLs like this So google, at least for the time being, we shut out adding new blogspot posts to our index until we clean all the bullshit you dumped on us out of our indexes.
Sound familiar? I mean, just change the names, and the result is the same. Blogs are simply more sophisticated home pages, for many, as I've written. And splogs (spam+blogs) are just more sophisticated home page spamming attempts.
Allow anyone to create content for free or with no real barriers, and surprise, a few people will go to extremes and be abusive. Result? IceRocket no longer trusts the free blog space that Google offers through Blogger/Blogspot, in the same exact way that many search engines stopped trusting the free web space of the GeoCities of the past.
Google's Failure To Police Or Post Barriers
Google: Kill Blogspot Already!!! from Chris Pirillo also went up Sunday. As with Mark Cuban, Chris finds Blogger's Blogspot-hosted blogs the chief culprit.
I don't know what's (specifically) making it so insanely easy for these spammers to get signed into your system, but you need to change that....
Suggestion, Google? As bold as this might sound, you should institute an authentication system - a captcha of sorts - for every single post that gets sent through your Blogger service. This means that there's no more easy rides for the idiots out there who are killing your baby and the blogosphere.
Fair enough -- some barriers to entry would help, either in setting up the free space (captchas, charging a token fee, whatever) in the first place or perhaps even in how people are allowed to post.
Google's certainly winning no points with me on this front. Back in June, I wrote My Encounter With Search Spam On Blogger, where I talked about someone that lifted a description from the Search Engine Watch web site in a misleading manner, the same person who had lots of other splogs going, as well. In addition to writing about it, I also went through the formal reporting channels. Nevertheless, there it sits still.
But Barriers Won't Solve Issues For Blog Owners
Of course, it won't help if only Google cleans things up. I haven't checked, so apologies if I'm mistaken, but I'm pretty sure that I can get going with free space over at MSN Spaces and Yahoo 360. Google's Blogger is simply a more well known service. Closing down abuse at Blogger would be great, but I suspect that just means the abuse will move elsewhere.
For potential bloggers, I'm afraid my advice about free home pages from back in 1997 will become just as applicable to free blogging space:
That may seem unfair [search engines ignoring free web pages], but when you use free web space, it's as if you have hundreds of roommates. They can get the entire domain in trouble, and the police, or the search engines in this case, may not care that you are innocent.
Ask your provider if there have been any problems with search engines visiting free web pages. They should know if there are complaints, and they should also be able to help resolve any problems. They have the ability to direct large numbers of people toward the search engines, so it's to the advantage of the search engines to work with the providers.
If it's crucial to be indexed, you may want to consider leaving the free web space and going with a commercial hosting service.
In other words, get your own domain name. It has never, ever, ever, ever, ever, ever been a good idea from a search marketing perspective to make use of someone else's domain name, as you are not in control of your own destiny.
I don't care whether it is Blogger, MSN Spaces, Yahoo 360, Typepad, WordPress or anyone. If you make use of someone else's domain name, you are ultimately leaving yourself open to:
Don't trust me? Don't trust this fundamental bit of advice that I and other search marketers have been saying for years, to have your own domain name? Then usability expert Jacob Nielsen just said the same thing today in Weblog Usability: The Top Ten Design Mistakes. Tip number 10 is not to use a domain name owned by another service. He talks about the controlling your own destiny issue, as well as being seen as an amateur and problems in moving over to your own domain name down the line.
Splogs & Searching Issues
How about the searcher side of things? Tara Calishain found Google and Feedster most impacted by splog, Technorati seeming more resistant (probably in part, I suspect, because it actually spiders pages rather than relies on feeds) and Yahoo getting by primarily because of the limited feeds it covers.
Russell Beattie, like Chris Pirillo, found his PubSub feeds getting washed out with spam. I thought the comments below his post were especially interesting, looking at fighting back on the Google AdSense front. It's an issue that's come up before. Not only does Google host a bunch of this junk content, but it also helps fuel it by people earning through AdSense.
Ranking By Time Magnifies Spam
Back to Mark Cuban, his post highlights one of the key issues that blog search faces. Time ranking magnifies the spam problem.
The major search engines have plenty of spam in their indexes. You simply don't see this as much because searches are sorted by relevancy. What are deemed the best pages across the entire web? Links are used to help calculate this, but textual data on the page and in the links, along with many other factors also come into play.
In contrast, blog search is largely ranked by time. Post something, send it out in your feed, and boom -- you're at the top of the list! That is, until someone else posts and pushes you back down.
How About Some Authority Mixed In?
Solution? How about ranking by time and also limiting matches to only quality blogs. Ah, but you see, that's what PubSub supposed to be able to do. When you create a feed over there, you can use the Filtering By LinkRank feature limit to the top 1 percent, 2 percent, 5 percent, 10 percent or 25 percent of blogs (or technically, feeds).
I've played with it a bit, and haven't been impressed. I got a feed for [google] and know I've limited it past the default (PubSub unfortunately doesn't show your setting after a feed is made). Nevertheless, most of the current matches right now are all coming from one site simply because the word "google" appears in the "Ads By Google" links it carries.
Still, the idea is good, so perhaps it will improve at PubSub or another service (Om Malik's saying forthcoming Sphere will do this. Cool if so, but I haven't played with it yet, and we'll all see).
News Search Is Great Because They Limit Sources
Over at Robert Scoble, his The race to time-based and blog search post last week touches on exactly the problem of mixing time and relevancy together. His view is that search engines in general suck on the time-based aspect:
Let's look at Yahoo, Google, and MSN first so you can see just how bad those three are if you want to find something that was added to the Web yesterday.
We have a great case study. Yesterday Microsoft and Real settled their anti-trust case and announced a new partnership. It was written about on hundreds of blogs and hundreds of ?pro? news sources.
We also have today?s Apple announcements. So, let?s search on both of those...
Robert goes on to be unimpressed at finding new stuff. But the reality is that search engines are great at finding new stuff. That's called news search. And news search is great because the sources are limited. Not everyone get in. It may be that for blog search to be great, you have to have that same time of limitation. More on this in my response to Robert's post, which I've reprinted below:
Let's qualify. You mean how bad they are if you only look at the web search results and ignore the onebox/shortcut displays they have.
In other words, do [video ipod] on Google or Yahoo, and at the top of the pages, they show you plenty of news results. They aren?t behind in gathering fresh data. They?re simply segregating it into the news area and giving you a heads-up that it is there.
You?re either missing it or ignoring it because those top of the page segments don?t feel ?normal? to you. All I can say is that the search engines are aware of that issue.
If you look at my Invisible Tabs article it talks about how at some point, the search engines need to automatically push the right button or tab or link for you, to give you 10 news results for queries that obviously are news related. Or you do a shopping search and you get all shopping results automatically.
FYI, Technorati's out with some handy fresh numbers, finding that two to eight person of new blogs are spam but notes this weekend's problems may have been perceived as worse simply because spam is targeting the names of people. Bloggers are big ego searchers, so if someone targets your name, blog spam can see worse.
Want to comment or discuss? Visit our Search Engine Watch Forums!
Posted by Danny Sullivan at 4:22 PM | Permalink
Average CPC & Selling Ad Space from Aaron Wall over at SEO Book looks at how WordPress is now selling text links on its home page for $10,000 per day via AdBrite. What! Isn't selling links bad, and didn't WordPress get busted for doing something like this before? Short answer -- this isn't going to cause any Google or search engine problems like WordPress had before.
AdBrite seems a safe way for WordPress (or anyone) to make some money and not get back in hot water. Let's not have any pitchforks grabbed and thrust WordPress's way as somehow screwing with search results.
The long answer covering issues of link selling and breaking link love with redirection or nofollow attributes is spelled out for Search Engine Watch members, in this version of the post.
Posted by Danny Sullivan at 2:35 PM | Permalink
Searching (or Not) With Spam GooglePhilipp Lenssen of Google Blogoscoped fame has a new parody site online today called Spam Google. The site's slogan is, "Just the Noise." If you click the "Our Technology" link on the Spam Google homepage you'll see that Spam Google utilizes Google's well-known pigeon rank technology. (-:
Lenssen writes on his blog:
At the new SpamGoogle.com, you?ll find nothing but... spam! If you?re also tired of web pages not trying to sell you something, endless explanatory stuff from university servers, or blogs without Google ads, this is for you. You can learn the latest about cheap viagra, poker parties, or download just the spyware you need...Best of all, sites in the Spam Google index try to rip you off by being relevant to your search. And isn?t that what Google is all about, relevancy?Those are some very tough and critical words from Phillip towards Google but it goes without saying that spam is not only an issue for Gogole but one for all general-purpose web engines. They also say something about the commercialization of the web in general.
Back in April, Lenssen unveiled another Google site, Google April Fools.
Postscript: For the techies out there, Phillip shared the following with me about how Spam Google works: Internally, Spam Google makes use of the Google API along with PHP5. For every user query, only a certain subset of "spam" sites is being searched through. In other words, what you see on the SERPs is a sub-set of Google results... and a noisy one at that. Also, there's some easter-eggs hidden in the application, some making use of the Yahoo API.
Posted by Gary Price at 12:48 PM | Permalink
Rogue host changing customers' websites over at SEO Forum is an interesting read and warning to those to watch their hosting service. What's PhilC describes there is a hosting company that was unbeknownst to clients was apparently inserting links at the bottom of client pages to benefit the host. The screenshots here tell the tale much better. Apparently, the tactic was supposed to be stopped but started again.
Moral for anyone? Look at the cached pages you have in the major search engines. They'll show you what the search engine spider saw -- and any links that you might not realized were cloaked without permission to feed to the spiders.
Want to discuss? Visit our forum thread, Obnoxious cloaking scam.
Posted by Danny Sullivan at 8:43 AM | Permalink
Sites Get Dropped by Search Engines After Trying to 'Optimize' Rankings from the Wall Street Journal (paid sub. required) revisits the Traffic Power case against SEO Book's Aaron Wall -- though this time, diving into complaints about Traffic Power by customers saying they found themselves dropped after the company did SEO work for them.
These complaints all came out last year, so they aren't new. This is just a fresh retelling in light of the lawsuit. For its part, Traffic Power says ranking drops for most clients were due to search engines changing their indexing methods, not something they did.
The story further dives into Traffic Power having an "army" of cold-callers, according to former employees, and that over 100 complaints have been filed against the company with the Better Business Bureau. It also discusses how after the complaints, the company starting using other names such as 1P.com and First Place.
Overall, the article just highlights what I said earlier over the lawsuit against SEO Book. If the intent was to squash criticism, it's simply backfired into getting Traffic Power much broader and negative play that it was initially concerned about. These types of hits will continue to come as the lawsuit progresses, making me think the smart move would be to drop the suit altogether. Also, Aaron comments on his blog over here about the story.
Want to discuss? Visit the Traffic Power Files Suit Against SEO Book thread in our Search Engine Watch Forums.
Posted by Danny Sullivan at 7:53 AM | Permalink
We've had a debate on our SEW Forums recently (see Matt Cutts Comments On Reputable Sites & Link Selling) over whether publishers should tag paid links so search engines won't count them. Part of that covered the issue of links where payment isn't so readily visible. Help Us Build a Link Farm, Get an iPod Nano? from Traffick is a case in point. Want to win an iPod Nano from them? Just post some links to them. It's an example of how what may seem to be a clear-cut, black-and-white issue isn't. Those who forthrightly pay for links potentially might not get those links counted, but do a savvy rub-our-back giveaway and that might fly under the radar?
Posted by Danny Sullivan at 8:26 AM | Permalink
Earlier we noted that Google was testing a program to notify webmasters if they'd been banned on Google. Now Google's Matt Cutts has posted information more formally on his blog, in a Q&A format. More here: Alerting site owners to problems.
Posted by Danny Sullivan at 1:50 PM | Permalink
Feedster CTO Comments on Splogs (Spam Blogs)Scott Johnson, CTO at Feedster, has written a commentary for Media Post about splogs (spam blogs) and what the industry must do to combat them. The commentary is titled, The Newest Front in the Online Wars: Splogs.
A splog is a spam blog--that is, a fake blog that is created for the sole purpose of getting a high search engine "page rank" to reap profits through ad clicks, or to drive customers to an otherwise obscure e-commerce site... To give you an idea of the magnitude of the problem, in the United Kingdom there is a company with over 15,000 spam blogs at last count. There were well over 10,000 spam blogs on BlogSpot alone related to the Triple Crown horse races. Of course, each time a visitor clicks on a paid search term, the advertiser pays for it and the "splogger" gets a revenue share... Blog search companies must maintain an aggressive stance on blog spam, and continue to hone their tools and techniques.Posted by Gary Price at 9:50 AM | Permalink
Traffic Power Lawsuit Update from Aaron Wall notes that the suit filed against him by Traffic Power over allegedly revealing trade secrets has been moved from Nevada state court to US federal state court, through his efforts. The federal case number is CV-S-05-1109-RLH-LRL. For more background on the case, see Traffic Power Suit Could Be Quashed Through Anti-Slapp Motion and SEO Book's Aaron Wall Sued By Traffic-Power Over Revealing "Trade Secrets".
Posted by Danny Sullivan at 12:50 PM | Permalink
Can We Declare Automated Comment & Link Posting To Be Bad?This week, search engines and blog software vendors are meeting again for a second "summit" on fighting blog spam. That's tipped me over the edge for tossing out a related proposal to search marketers and marketers in general. Can We Agree Automated Comment & Link Posting Is A Bad Thing? has the full details in our Search Engine Watch Forums.
In short, I explain that unlike some other debates over what's spam or what should be acceptable in search marketing, inserting content and links into other pages through automation just doesn't seem a defendable practice by anyone. Indeed, even "black hats" get annoyed by it.
So as an industry, or a community, could search marketers unite to say "No!" on this practice? Lots more explanation and thoughts are covered in the forum thread, plus the ability to vote and chime in with opinions. Please stop by.
Posted by Danny Sullivan at 11:59 AM | Permalink
Checking If You're Banned On GoogleNow that you know how to get reincluded in Google if banned, thanks to minty fresh advice from Google's Matt Cutts, how do you know if you've been banned at all? Some advice:
In the end, the very best way would be if Google provided a ban checking tool of its own. The current test of sending banned notification emails provides a glimmer of hope that this might come. Google's rejected such things in the past, but recent discussion between Matt and I and others at Threadwatch suggests perhaps it could happen.
Want to discuss? Visit our forum thread, Google Testing Ban Notification -- Could New Webmaster Tools Come?
Posted by Danny Sullivan at 10:38 AM | Permalink
Penalized in Google? Here's What to DoIf you've pushed the line with optimization techniques and have been dinged by Google, Matt Cutts offers advice on how to get back into the search engine's good graces. Matt says it boils down to two basic things:
Fundamentally, Google wants to know two things: 1) that any spam on the site is gone or fixed, and 2) that its not going to happen again. Id recommend giving a short explanation of what happened from your perspective: what actions may have led to any penalties and any corrective action that youve taken to prevent any spam in the future.Follow Matt's guidelines and with luck your site will be reinstated in Google search results in anywhere from 2-8 weeks.
Posted by Chris Sherman at 7:43 AM | Permalink
Nice scoop over at Threadwatch, coming off a thread at Search Engine Forums! Google Pilot New Webmaster Communications Initiative at Threadwatch covers a new Google program where Google is emailing those who run sites where they spot things they think might violate their guidelines. Google's Matt Cutts is going to blog more, but he comments over there:
Google is trying out a pilot program to alert site owners when we're removing their site for violating our guidelines. JavaScript redirects are the first trial, but we've also sent a few emails about hidden text, I believe. This is not targeted to sites like buy-my-cheap-viagra-here.com, but more for sites that have good content, but may not be as savvy about what their SEO was doing or what that "Make thousands of doorway pages for $39.95" software was doing. Personally, I think opening up a line of communication to let webmasters know when we're taking action is a really good thing--a site owner doesn't have to guess about what happened. But again, we're starting with a trial program. I'll blog about it more soon. [Note: Matt's blog is here]
Yeah, communication is great. It's just odd that after years of being told it was impossible or difficult to provide some type of "is my page OK" tool for webmasters to use, now Google's proactively doing it. Such a tool has been dismissed as potentially helping spammers.
The email sent says that a particular URL was removed and lists some reasons why, along with a note that it will be pulled for at least 30 days unless content is changed and a reinclusion request is done.
Posted by Danny Sullivan at 4:03 PM | Permalink
Legal Showdown in Search Fracas from Adam Penenberg at Wired looks at the lawsuit by Traffic Power against SEO Book's Aaron Wall, alleging that he revealed trade secrets. Lawyers Penenberg talks with find the suit seems more about quashing negative opinions Wall had about Traffic Power than trade secrets. One from the EFF says the suit might be subject to an "anti-Slapp" motion, a statute to prevent lawsuits being brought with no merit and intended to silence critics.
The idea that trade secrets are involved gets shot down by citing my past examination of the case, as well as including observations from long-time search marketer Greg Boser, who says there are no secrets to protect because the code is published on a publicly-accessible web server. For more background, see my previous SEO Book's Aaron Wall Sued By Traffic-Power Over Revealing "Trade Secrets" article.
Posted by Danny Sullivan at 11:21 AM | Permalink
In last week's O'Reilly In Debate Over Link Selling, I looked in particular at Google guidelines and how they said not to buy links to boost rankings. Google spam fighter Matt Cutts now expands on that in his Text links And PageRank post. Selling links might not get a site banned, but it might prevent it from passing along reputation. Meanwhile, Google might crack down on those buying them -- which has one search marketer wondering if link purchaser Yahoo needs to lookout, while Gary Hat Search Engine News jokes that an Intent-O-Meter may be on the way.
For Search Engine Watch members, the longer version of this story takes an expanded look at the points Cutts makes and the Yahoo link buying.
Want to disagree, applaud or comment in some way? Visit our existing O'Reilly In Off-Topic Link Selling Debate in our Search Engine Watch Forums or start a new thread in our Link Building section.
Postscript: See the new Matt Cutts Comments On Reputable Sites & Link Selling thread for much more discussion of the issue, and also check out the comments below Matt's original post.Posted by Danny Sullivan at 11:11 AM | Permalink
I wrote back in April about how the sale of off-topic links to advertisers looking for search ranking boosts had become well seated within university newspapers, with the Stanford Daily paper as a classic example. My longer piece for Search Engine Watch members went further in depth, examining how links like these even showed up at places like the Washington Post.
Now respected publisher O'Reilly has come under fire for selling off topic links. It's not something new that they've been doing. Nevertheless, the attention and belated realization that they might be helping people to "game" search engines is causing O'Reilly chief Tim O'Reilly to do some hard thinking. It also raises the broader question of when is it "fair" or "ethical" to sell links.
Let's first talk about what this was not. This was not a cause of O'Reilly carrying hidden links, as happened with the Financial Times in June. These links were perfectly visible to anyone and present on various O'Reilly sites for at least two years.
Blogger Phil Ringnalda seems to have just discovered them this week, resulting in his O'Reilly Joins The Search Engine Spam Parade post. In it, he talks about the type of links you can see on the O'Reilly OnLamp.com site or XML.com. The screenshot below shows an example from XML.com:
Look on the left-hand side. See the two boxes? The links have little to do with XML. Why would someone buy them on this site? One leading reason is that they may be hoping the anchor text will help them rank well for the terms they've bought -- especially in that they are getting links from a fairly "trusted" site like XML.com.
Perhaps this is helping. A search for canada hotels on Google brings up the canadianhotelguide.com home page, which is exactly what the "Canada Hotels" link on the O'Reilly site leads to. Moreover, a backlink look up clearly shows that links from XML.com are being credited to the Canadian hotel site from Google.
Tim O'Reilly himself admits all this, after having been alerted to the situation now and examined it to some degree. In his Search Engine Spam? post, he writes:
That being said, it's become clear to me on investigation that these folks are indeed paying us for our Google rank, and not just for clickthroughs. We just aren't targeted enough for their ads to be justified on a click-through basis. What's more, using Google's link: keyword to check for top links to these particular advertisers shows that the O'Reilly sites they advertise on are among their chief link sources. They aren't getting independent links from users. In short, these advertisers are using O'Reilly and other highly ranked sites who accept their advertising to improve their chances of being discovered via search engines, rather than in quest of direct click throughs (although those may also provide some value for their ad buy.)
His problem now is what to do. Many in the comments to his post, including Google's spam fighting chief Matt Cutts, have suggested that he add nofollow attributes to these ads if they are going to continue to run, as a way to prove he's not inadvertently messing with the relevancy of search results.
From a public relations standpoint, it's an easy fix. Slap a nofollow on those links, and no one can accuse you of doing anything wrong. From an "ethical" standpoint, it's not so clear cut. Who's to say that you were doing anything wrong in the first place?
People have bought and sold links before search engines made much use of them for ranking purposes. Just having a link on a page can send traffic, even if it's an "off-topic" link. Heck, just imagine the number of "off-topic" ads you've seen on or offline in various situations.
Links also generate revenue. The search engines helped create an economy that revolves around links. If a site realizes it has valuable real estate, is it unethical to stay in business by selling some of that value, the reputation it can pass to another site? Does it make a difference whether the link you sell is "on topic" or to another "reputable" site.
Google Recommendation: Buy Yahoo Links
To further muddy the waters, let me take an example of what was asked at our recent Search Engine Strategies conference, in a session on buying and selling links. If buying links is supposedly so bad, then why does Google have no problem telling people to buy them from Yahoo?
What? Google says buy links? Sure, indirectly right on its webmaster guidelines page:
Submit your site to relevant directories such as the Open Directory Project and Yahoo!, as well as to other industry-specific expert sites.
The Yahoo Directory charges for listings, if you want to be in a commercial category. It's $300 per year to be included. Not just evaluated, included. If you don't pay year after year, you don't stay in. And despite using redirection, last time I looked, these links still get counted by Google for link credit.
So it's OK for Yahoo to sell links but not O'Reilly? Therein lies part of the debate. Perhaps it's OK at Yahoo since it's well regarded as classifying web sites by category. By no means are O'Reilly's links mean to be any type of directory-style system.
Then again, just being a directory might not be a safe harbor. There's been an explosion in the number of directories springing up. As webmasters have gone on a quest to gain links, new directories have emerged to satisfy that need. But the Did Google Just Target Directories? thread in our forums looks at whether Google in particular has worked to remove the value of some of these places.
What's A Link Scheme?
Let's go back to those Google guidelines to see what at least one search engine has to say on buying links:
Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.
It's not actually saying specifically not to buy links, as you can see. It is a warning not to be involved with "schemes" that are designed -- in my view -- solely to boost rankings. If O'Reilly feels there's no value other than link boosting to its program, then I suppose that might be a scheme.
In contrast, let me turn the magnifying glass back to Search Engine Watch. Since we've had our forums, the occasional person has had great fun with the notion of not selling links, pointing at the "internet.commerce" links we used to carry.
Used to carry? Yep. I got them removed about a week ago. As many of you know, Search Engine Watch has a new owner. This ad program wasn't one we had to continue to carry, so I asked to have those links removed from our site. If you still see them around on pages here or there, that's just something we haven't caught. They'll also come out of the SearchDay newsletter when that's redesigned in about three or four weeks. But they looked similar to this:
That's off the Internet.com home page. As you can see, the links have little to do with internet issues. Spam city!
Retroactively, You're Bad!
Well, not exactly. As I said, the issue had been raised on our forums. Back in December 2004, I explained a bit more about this particular program. In particular, it predated Google, running since around 1998, I believe. It also isn't sold as a rank boosting system.
Now I don't handle the ads on the site. Nevertheless, I still get stuck being as a spokesperson for things that aren't in my control from time to time. Shortly before nofollow came along, I talked about these ads and how I wished I had an "ignore" tag that could go around them, to ease the PR headache side of things.
My wish was granted. Nofollow arrived. In addition, soon after nofollow came out, someone who had participated in the internet.commerce program had a ranking drop. Her concern was that somehow, because she suddenly gained a ton of links through it, Google had given her a penalty.
As it turned out, that wasn't the case. She'd also done a number of other things to her site that were responsible. But I put both issues on the table to Jupiter. Surround those links with nofollow! They remove the PR headache, and more important, they protect you from having some other site owner think you've some how created a link "scheme" that Google or another search engine wants to penalize.
To me, I got a dragging their feet response. I was told there were some issues with implementing this in the ad system. I also got some resentment that the company was supposed to be changing a program it had always operated just because years later, Google in particular thought selling links was a bad thing.
And naturally, no one would buy those links with nofollow on them, right -- so the ad department wouldn't want nofollow to go? Check out our The New Nofollow Link Attribute on the SEW Forums, and you can see someone speculating how the links wouldn't sell, if that were the case.
Value Beyond Ranking
My response was if so, too bad. For one thing, I said my assumption was that any decent search engine was already smart enough to see the same link with the same text on every page and not give more credit than deserved. They weren't sold as a rank boosting thing and should have value beyond that. I also said that the links should have value beyond just a ranking boost. And you know what? Apparently they do.
That link selling session I mentioned at our show two weeks ago? The issue of the internet.commerce links came up. A number of people had bought them who were in attendance. Going out on a limb, I asked if they'd been pitched by sales people as having rank boosting purposes. Nope, they hadn't, a hands-up type of response verified. Phew! In addition, a number of people found that even though they were supposedly "off-topic" links, the still generated quality traffic independent of search engines. You can read the account of the session over here at Search Engine Roundtable.
That session also burst into applause when I pointed out that the issue of buying links isn't raised if you're talking about Google AdSense. Don't buy links? Please. That's all AdSense is, a program where people can buy and sell links from across the web -- nor are they always on topic to the page, despite the contextual targeting.
The issue, of course, is buying links purely for search ranking gains. The reality is that some publishers are going to have to resort to using nofollow or some type of redirection as a means of "proving" they mean no harm to the search engines. It might also be wise to ensure that some advertiser who takes part in the program doesn't come back with an accusation that they unknowingly were taking part in some bad link "scheme."
Other publishers pressed for money might decide to push back. They might deem selling links and part of their reputation as fair game, especially in an ad downturn. Google and gang don't like it? Let them sort it out.
As for me, I'm glad to be rid of the internet.commerce box. We still have our SEW Marketplace ads, of course. Those are certainly on target for our site, with ads from search engines and search marketers. But I suppose the next issue for us is, for those ads that don't have redirection, should we be adding nofollow to their links? Probably, because again it solves a potential PR problem.
Postscript: See also Google's Matt Cutts On Link Selling: Sites Might Not Pass Reputation; Buyers Might Get Targeted MorePosted by Danny Sullivan at 3:58 PM | Permalink
The Polarization Of Search Forums?The First SEO Republic Forumed from Jim Hedger at Search Engine Guide looks at the idea of search engine forums shaping up as an almost "horseshoe shaped congress" with black hatters on one end, white hatters in the other and some in the middle (including our SEW Forums). He finds the polarization ugly, political and producing negative repercussions for the industry.
My view is completely different. I think some forums can be assigned more "white hat" or "black hat" but not entirely. And if anything, I've felt many of the discussions on the issue of white versus black hat have been far more productive than in the past.
I know our own forums the best, of course -- and I know we work really hard to ensure people are getting along to respect each other even when views are different.
The main thing I've noticed over the past year is in my view, people do seem to be understanding other view points, at least around our place. The idea that things are all black or white does seem to me to be slowly going away, which I've felt is more a good thing. I've also seen plenty of people who have completely opposite views nevertheless agree to disagree, rather than friendships being torn apart.
It bears reminding that we've long had a wide-spectrum of political-like opinions about SEO and search marketing, as I covered way back in my Desperately Seeking Search Engine Marketing Standards piece of 2001. We even had this spectrum before the Google "monoculture" days Jim argues we're now emerging from. Yep, renewed attention is being paid to the other Bradys Of Search -- but we've always had them and did have them even more strongly in the 1995-2000 time frame.
Want to comment or discuss? Start a thread over in the Search Industry Growth & Trends section of our forums.. Want to read some past White Hat/Black Hat discussions? White Hat - Gray Hat - Black Hat has a bunch of them.
Posted by Danny Sullivan at 11:06 AM | Permalink
Google's Blogger team has unveiled two new features that could potentially help slow the amount of spam coming from Blogger/Blogspot hosted blogs.
First, up is the new "flag" feature that allows the Blogger community identify potentially "objectional" blogs. This post offers more details and clarifies that simply having a blog listed as "objectionable" doesn't mean the blog content will be removed.
In the cases where objectionable content has been identified, the most common action is for the support team to "delist" the blog. This simply means that the blog is not promoted in areas of blogger.com like Recently Updated - but it's still viewable on the web. The content is not blocked or removed in anyway when the blog is delisted.As noted above, objectionable blogs will not be removed from the system but I was unable to find any info about if these blogs will be indexed/reindexed differently or if a blog marked as "objectionable" will influence how a blog ranks.
I think it would be useful for the Blogger team to provide access to a list of "objectionable" blogs as well as more info about what criteria they're using (aside from letting people vote) on how they will assess, "what the community has noted as potentially objectionable." Also, who makes the final determination if a blog is "objectionable" and what happens if a blog in incorrectly is delisted? Can it be appealed? Don't get me wrong, I'm all for getting rid of blog spam but at the same time it's important for Google/Blogger to more clearly explain the delisting process.
The second new feature that has been introduced to hopefully reduce blog comment spam is an option for a Blogger user to require word verification for comments people post on a Blogger powered blog. In other words, a person who leaves a comment will need to enter a word or letters into a box to get the comments to post.
Note to Blogger: If you want to let more of your users know about these new services, how about mentioning them on the Blogger home page, login page, and elsewhere? When I checked earlier today, I couldn't find any info about these new services.
Posted by Gary Price at 1:34 PM | Permalink
Mark Cuban is mad that blog spam is causing bad results at Ice Rocket, the blog/web engine he owns. So, Cuban has vowed he's going to do something about it. What? No specifics but they're looking at a variety of options." (-: Cuban has also launched SplogReporter.com, a site where anyone can report spam blogs. To submit a url, you'll need to supply an email address. However, I couldn't find any privacy policy or of what will become of these email addresses listed on the site. Good luck Mark, you're going to need it. Of course, this outcry has also done one something else. What? Free publicity and increased name recognition for Ice Rocket. More in the eWeek article: Blog Search Engine Threatens Ban of Blogger Blogs. Btw, Ice Rocket announced a few weeks ago that they're planning to launch a blog only engine named BlogScour.
Posted by Gary Price at 2:44 PM | Permalink
Today guest writer Marcela De Vivo wraps up her look at search engine penalties, describing those applied by Yahoo and MSN. Yahoo's penalties are similar to Google's, but can be applied in different circumstances, and in some cases are harder to remove. Read on for more in today's SearchDay article, Search Engine Penalties at Yahoo & MSN.
Posted by Chris Sherman at 2:07 AM | Permalink
Search engines have rules for what's acceptable—and not acceptable—for content, linking, and many other factors that are used to calculate relevance. Some guidelines are clear and public, but other policies are known only to the search engines themselves, and if you step over the line, your site may be dinged with a penalty that decreases your rankings or worse, eliminates your site from search results altogether.
Even though many of these policies are not public, observant search engine optimizers have recognized and described many tactics that can draw a penalty to web site. In today's SearchDay article, Coping with Search Engine Penalties, guest writer Marcela De Vivo describes many of these gotchas, and offers advice and guidance for dealing with penalties.
Posted by Chris Sherman at 8:00 AM | Permalink
It's Q&A madness out there. Below links to Q&As with a black hat on SEO, a recovering PR meister on search and public relations, an information research professional on search marketing, Bloglines, PriceGrabber, Digital Point and MSN Search to come.
Posted by Danny Sullivan at 4:02 PM | Permalink
White Hat - Gray Hat - Black HatWhite Hat Now In The Cesspool? over in our Search Engine Watch Forums has members debating whether staying within search engine guidelines can still be effective in gaining rankings. Meanwhile, Is Black Hat SEO A Form Of Hacking? also in our forums has members debating whether going outside those guidelines is the equivalent of hacking a search engine. Love these type of debates? Then you'll love this list of other material, which I also posted in the Black Hat thread:
Posted by Danny Sullivan at 7:47 AM | Permalink
Spotted via SEO Book, Hidden Links on The Financial Times Website from Linking Matters spots how the Financial Times was carrying hidden paid links on its pages.
Screenshots show how a link to moneysupermarket.com was placed under visible links in the "Partner Sites" section of pages at the FT. Hidden links like this are something search engines such as Google warn against doing, lest they be considered search spam.
An update to the story provides some comments from the FT, saying it had a relationship with moneysupermarket.com and that the links had been made visible (I didn't see any links when I looked today, visible or hidden).
Interestingly, the FT had no comment about why the links were hidden originally. That reminds me of WordPress, where hidden links that service was found to have earlier this year lead to search spam. While admitting they were wrong, why they were hidden in the first place was never answered.
Newspapers selling links isn't new. Stanford Daily Removes Paid Links from our blog last month looks at how the student paper at Stanford University recently removed paid listings after coming under fire again that they were being sold in a way that worked against search relevancy.
Stanford University's Student Paper & Selling Links for Search Engine Watch members in April looked at the issue in depth, and how other student papers not to mention major media outlets like the Washington Post and CBS News also sell links. Our forum thread Stanford Daily Selling Links also has discussion on the issue.
In none of these cases, however, were the links hidden from view. People can and sell links for any reasons they want and in fact have been doing so since before search engines depended on them heavily to rank pages.
A search engine might view off-topic links at at attempt to spam them and perhaps even penalize the site and those involved, as happened with SearchKing. Then again, it might not. Was the main reason the link was sold to try and influence search rankings? If so, then they'd be more likely to consider that an attempt to spam them. A hidden paid link pretty much has only one reason to be hidden -- to influence search engines.
Postscript: The New York Times has a short brief on the story with a quote from the FT on not wanting to clog its page "real estate" with an "overt link."Posted by Danny Sullivan at 5:17 PM | Permalink
My Encounter With Search Spam On BloggerWith 100 million billion trillion new blogs springing up every day, some of them are going to be spam. And with Google operating one of the biggest blogging services, it's not surprising that Blogger will be host to a few search spam sites. Indeed, I've seen complaints about this in the past from others. But today, Blogger spam got in my face.
Check out this blog, which is about watches, or Seiko watches or wait a minute -- about me! From the blog's description:
Search Engine Watch: Tips About Internet Search Engines & Search ... Danny Sullivan's comprehensive coverage of the search engine world. Forums, reviews, articles, ratings, and frequent newsletters. Paying members receive access ...
Nice. Sounds like someone did a Google search for watch, saw that Search Engine Watch get listed first (not my fault -- blame Google, not me) and scraped off the description that showing at that time to use on their blog.
Not nice, Jim. Jim? Jim's the blogger who created the blog. I tried to contact Jim using his profile page, but he didn't leave an address. He was probably too busy to do that, what with creating other high-quality blogs on topics such as:
Not able to reach Jim, I figured I'd drop Blogger itself a line. Surely it doesn't tolerate spamming its parent, Google. Off to the Blogger help page I went.
Where to go? The Blogger Terms of Service link seemed helpful. It even eventually got me to the actual terms. Anything about not using Blogger for search engine spamming purposes? Not that I could spot when skimming. Here are the most relevant areas:
The Service makes use of the Internet to send and receive certain messages; therefore, Member's conduct is subject to Internet regulations, policies and procedures. Member will not use the Service for chain letters, junk mail, spamming or any use of distribution lists to any person who has not given specific permission to be included in such a process.
--and--
You agree to not use the Service to: (a) upload, post or otherwise transmit any Content that is unlawful, harmful, threatening, abusive, harassing, tortious, defamatory, vulgar, obscene, libelous, invasive of another's privacy, hateful, or racially, ethnically or otherwise objectionable; (b) harm minors in any way; (c) impersonate any person or entity, including, but not limited to, a Pyra official, forum leader, guide or host, or falsely state or otherwise misrepresent your affiliation with a person or entity; (d) upload, post or otherwise transmit any Content that you do not have a right to transmit under any law or under contractual or fiduciary relationships (such as inside information, proprietary and confidential information learned or disclosed as part of employment relationships or under nondisclosure agreements); (e) upload, post or otherwise transmit any Content that infringes any patent, trademark, trade secret, copyright or other proprietary rights of any party; (f) upload, post or otherwise transmit any material that contains software viruses or any other computer code, files or programs designed to interrupt, destroy or limit the functionality of any computer software or hardware or telecommunications equipment; (g) interfere with or disrupt the Service or servers or networks connected to the Service, or disobey any requirements, procedures, policies or regulations of networks connected to the Service; (h) intentionally or unintentionally violate any applicable local, state, national or international law, including, but not limited to, regulations promulgated by the U.S. Securities and Exchange Commission, any rules of any national or other securities exchange, including, without limitation, the New York Stock Exchange, the American Stock Exchange or the NASDAQ, and any regulations having the force of law; (i) "stalk" or otherwise harass another; (j) collect or store personal data about other users; (k) promote or provide instructional information about illegal activities, promote physical harm or injury against any group or individual, or promote any act of cruelty to animals. This may include, but is not limited to, providing instructions on how to assemble bombs, grenades and other weapons, and creating "Crush" sites;
Hmm. Can I suggest that if spamming in terms of email isn't allowed, Blogger might specifically say search engine spamming isn't allowed, either?
In case I missed something, I thought I'd report the blogs to Blogger anyway for potential violations of the terms. There's even a link at the bottom of the terms page:
VIOLATIONS Please report any violations of the TOS via the Blogger Support home page.
Sadly, while I easily found advice such as:
I found nothing on how to report to Blogger about a terms of service violation. There was, at least, this section:
Can't find what you're looking for in Blogger Help? First check Blogger Status and our known issues page, then write Blogger Support and we'll see what we can do.
Annoyingly, to report anything, you have to have a Blogger account. I do, but not so nice for someone else who doesn't.
Next, the report form has these options:
Nothing about reporting terms of service violations or search spam. Well, I reported the spam I spotted as a "Something is Broken" options and pointed them over to this post, in case they wanted to know more.
Posted by Danny Sullivan at 3:30 PM | Permalink
One of the great things about wikis is that anyone can easily contribute. At the same time, the ability for anyone to contribute can also cause problems. While reviewing a page on the Yahoo Developer Network wiki this morning, I noticed that sometime over the weekend some "wiki spam" with links to purchase drugs (like Viagra) appeared on the page. Of course, it just might be that some YDN member is working on a way to search Yahoo and take care of "other problems" at the same time. (-: (-: (-:
Update: The spam has been removed. Here's a screen cap of what I saw earlier.
Posted by Gary Price at 11:11 AM | Permalink
Companies subvert search results to squelch criticism from the Online Journalism Review looks how companies are trying to squash bad public relations in search engines.
From the article:
It's not illegal, but it's SEO gone bad. Companies such as Quixtar are using Google-bombing, link farms and Web spam pages to place positive sites in the top search results -- which pushes the negative ones down.and
"I don't have any problem with search engine optimization, and businesses have every right to do it. But my complaint is that this is something that you don't want everybody to know about, because you know that it's deceitful, and it's not about providing value for people. It's not about providing a great information resource that will be the #1 site on the Web. It's about flooding the Web with crap, and in that sewage, [they're] going to bury everyone else. That's my main concern. The implications go across to other businesses like Scientology." -- Eric Janssen, proprietor of Quixtar Blog and online creative manager for the Memphis Commercial Appeal's sitePosted by Gary Price at 8:25 PM | Permalink
The Stanford InfoLab has just posted a new 22 page technical report titled, "Link Spam Alliances." It might be of interest to some of you.
From the abstract: Link spam is used to increase the ranking of certain target web pages by misleading the connectivity-based ranking algorithms in search engines. In this paper we study how web pages can be interconnected in a spam farm in order to optimize rankings. We also study alliances, that is, interconnections of spam farms. Our results identify the optimal structures and quantify the potential gains. In particular, we show that alliances can be synergistic and improve the rankings of all participants. We believe that the insights we gain will be useful in identifying and combating link spam.. The paper includes a focus on how, "link spammers manipulate PageRank scores."
Full Text: Link Spam Alliances (PDF). NOTE: The Stanford site is currently offline so I've linked to my personal copy. I will change the link back when the Stanford site becomes available.
Posted by Gary Price at 12:44 PM | Permalink
Wall Street Journal On Search-Funded SpamI've written before on the ironic situation of Google AdSense funding some of the spam Google has to eliminate from its indexed. Syndic8 Gets Outed for Spamming and WordPress Caught Spamming After Enlisting To Fight Spam are two close up examples, as well. Spotted via John Battelle, Web sites that exist only to sell advertising has Wall Street Journal writer Lee Gomes earlier this month feeling fed up with the situation as well. It's not an in-depth look but more a personal commentary -- and a sign that the issue is becoming more noticeable.
Posted by Danny Sullivan at 7:10 AM | Permalink
In April, I wrote about how Stanford University's student newspaper and many other student papers were selling offtopic links to advertisers almost certainly trying to increase their search rankings. Blake Ross came across the situation this week and noted it on his blog. In a new twist, he seems to have also found WordPress-like doorway page spam, as well. The Stanford Daily and search spam from Silicon Beat has fresh comments from the Stanford Daily, with them saying they'd review the situation. That review seems to have happened, as Ross's blog notes the ads appear to have been removed. My previous summary of the situation is here: Stanford University's Student Paper & Selling Links. If you're a Search Engine Watch member, the longer version of that article goes into depth about the situation, with screenshots and a summary of what has been happening at other universities. Want to discuss? Visit our forum thread, Stanford Daily Selling Links
Posted by Danny Sullivan at 10:33 AM | Permalink
Dictionary spam is when someone creates a web page that seeming has every word in the dictionary, in hopes of capturing someone who might be searching for anything. Now here's some number spam. Google Search by Number Spam from Barry Schwartz at Search Engine Roundtable describes how he searched for a package number on Google. Yep, he got the Google shortcut that leads to the UPS site. However, he also got a page promising adult content that came up because it lists a ton of various number combination. Note to package tracking companies: if tracking numbers have no spaces, it will make it hard for this type of thing to work.
Posted by Danny Sullivan at 9:10 AM | Permalink
Remember the WordPress spamming search engines story (some might use stronger terms) from a few weeks ago? Via Google Blogoscoped we read that RSS directory Syndic8 is now being dinged for doing something similar.
Charles Coxhead and Andy Baio say RSS directory Syndic8.com are using sub-domains with ?junk articles? which serve no other purpose than to lure searchers to their Google ads. As opposed to other similar cases, Syndic8 ? ?the place to come to find RSS and Atom news feeds on a wide variety of topics? ? openly links to these sub-domains, albeit only in the footer of their homepage in small-print. Syndic8.com is a large directory, and they could use their linking power to boost specific sites.Postscript: I Was Really Stupid, and Greedy Too has a response/apology from Syndic8.
Posted by Gary Price at 2:23 PM | Permalink
SEO Inc Tries To Silence Google Blogoscoped Over RankingsLast month, SEO Inc apparently fell out of the top rankings for the term "search engine optimization" at Google. I felt it was a non-story then. That's changed now that the company issued a cease-and-desist notice against Google Blogoscoped, implying that Philipp Lenssen there may have trade libeled them. More details and a copy of the letter from Philipp here: SEO Inc Sent Me a Cease & Desist.
Wow. What did he say? John Battelle has a reprint over here, but here's the key passage is this:
It?s kind of ironic that SEOInc.com, a search engine optimization company which for a while was on the Google number 1 spot for the highly competitive query "search engine optimization", is now nowhere to be found in the Google results. This is likely due to the recent PageRank update and even more algorithm tweaks implemented by Google. Enter ?SEOinc? into Google.com, and SEOInc.com is nowhere in the top 10; and the SEOInc.com PageRank has dropped to ?none?. Only by entering ?site:seoinc.com? into Google will you see the site is still indexed in some way.
And while a low or non-existent Google ranking is bad enough for sites outside the SEO industry, it hits everyone in the SEO business twice as hard: not only are SEOInc not being found with search engines anymore, they?ve also lost their biggest proof their services are worth paying for.
Of course, the fact this site has seen the Google death penalty hints that they?ve overoptimized using ?black hat? search engine optimization (such as linkfarms, for example).
Who is Philipp to say that SEO Inc lost the biggest proof that their services were worthwhile? Actually, SEO Inc. made this suggestion. Until recently, it had these claims on its web site, which Philipp's article lead off with:
?Search Engine Optimization Inc. uses our proven Search Engine Placement techniques to rank more sites in more top positions than anyone in the business. Our cutting-edge strategies are currently used by companies including AT&T Broadband, IGN, Sierra Trading Post, and Microsoft. (...)
The title of Certified Advanced Search Engine Marketing Strategist from the Academy of Web Specialists is your assurance that SEO Inc Search Engine Optimization incorporates highly effective, ethical and proven methods of gaining you top positioning.?
Those are now gone, though in a new development, the company appears to have recently become a member of the W3C. From its home page:
Search Engine Optimization Inc is the FIRST and Only search engine marketing firm to become a member of the (W3C) World Wide Web Consortium. Read Article here.
As said, I thought the company's drop in placement for "search engine optimization" was a non-issue when I heard about it a few weeks ago. I wouldn't have reported them as being "good" for having any type of placement, since placement for a term doesn't necessary mean good conversions.
In addition, top rankings can be meaningless. Was the term competitive or not? IE, does anyone actually search on it? And if you were top ranked, how long for? On which search engines? Ones people actually use? These are the types of reasons why I simply ignore any claims based on rankings.
Want to discuss or learn more? Check out these forum discussions:
That last thread we actually pulled from our forums back in mid-April. No, not because of a cease-and-desist letter or any message. Instead, our forums have a policy about public spam reporting. We don't allow it, unless a site is incredibly well-known or the issue has become discussed in a variety of public forums. Ironically, with the many blog comments now about the cease-and-desist, the thread that previously was pulled now qualifies for restoration.
Posted by Danny Sullivan at 1:58 PM | Permalink
Been up to no good with your SEO? Your logs might give you a clue before your pages disappear! David Naylor's Things You Don't Want To See post gives a brief example.
Posted by Danny Sullivan at 8:56 PM | Permalink
CNN Accused Of Blog & Search Spamming To Improve PRCNN gets accused of blog spam and search engine spamming. CNN on the Spam Attack? from Wired explains how blogger Nick Lewis spotted what he felt was a strange post that was pushing CNN programs. He spotted similar posts on other blogs.
That's the blog spam part, and pretty easy to see why you could think that's a guerrilla PR campaign going on, though CNN denies this in the Wired article. But the search engine spamming part? That's a bit more tricky.
Lewis claims that along with comments were a string of repetitive keywords, which he shows in his explanation of what was spotted. For example:
blog blog blog blog cnn cnn cnn blog blog cnn cnn cnn
He suspects this was placed on this blog to make it seem like he was keyword stuffing, to make his pages attract a spam penalty. The idea is that by doing this, his page would get knocked out of "the first hundred results for the google search 'CNN Blogs'"
Frankly, if he wasn't in the first 10 results, CNN wouldn't even care. No one would -- he's virtually invisible to anyone doing that search. More important, while possibly such a tactic MIGHT work, it was be far easier simply to fire up 10 official CNN blog and do optimization and link building to push whatever anti-CNN sites you disliked out of the top results.
Posted by Danny Sullivan at 6:35 PM | Permalink
Gaming Local Search Reviews: Part 2Earlier this month, I pointed out an article about local search and the merchant review systems often offered in them. Will we see these abused or gamed? Yahoo! Local Reviews Biased has Barry Schwartz at Search Engine Roundtable doing exactly that, to give his company a positive review, in a test at how the system works. Barry then immediately turned himself in to Yahoo, and to its credit, the review was promptly removed. But down the line, will see campaigns to skew reviews that aren't easily spotted?
Posted by Danny Sullivan at 2:57 PM | Permalink
The Stanford Daily Selling Links thread at our Search Engine Watch Forums (and see also this from Feb) looks at the ironic situation of the student newspaper at Stanford University -- the birthplace of Google, Yahoo and owner of the PageRank patent -- selling off-topic links to advertisers who almost certainly are trying to get better rankings on search engines.
For Search Engine Watch members, the extended version of this post goes into more depth of how why these links have raised eyebrows, given that some would view them as hurting search relevancy. It also looks at why university newspapers have become a popular source for those wanting to buy links to influence search engines, including a short tour of how papers beyond Stanford are selling.
The extended post also covers how link selling has become commonplace and why publishers wary of a public relations problem might wish to sell only redirected or nofollow links.
The story concludes with comments from the Stanford Daily itself. The paper said it unaware that there might be any search spam issues involved with selling links and emailed:
Distorting search results is not and has never been our intention. Our intention has been to make up needed income from classified, subscription and display ad sales lost to the internet through a new, legitimate method of advertising.
Posted by Danny Sullivan at 10:29 AM | Permalink
Yesterday's post about the revised research paper that will be presented at the WWW2005 Conference next month reminded me that I need to begin compiling links to some of the search papers that will be delivered conference. I plan to do get it done in several installments. So, consider this installment number one. Here we go. More coming soon.
+ An Analysis of Factors Used in Search Engine Ranking (AIRWeb Workshop) by Albert Bifet, Carlos Castillo, Paul-Alexandru Chirita and Ingmar Weber.
+ Web Spam, Propaganda and Trust(AIRWeb Workshop) by Panagiotis T. Metaxas and Joseph DeStefano. UPDATE: A slide presentation is now available.
+ Identifying Link Farm Spam Pages by Baoning Wu and Brian D. Davison. Note: A new tech report from Stanford's Zoltan Gyongyi and Hector Garcia-Molina: Link Spam Alliances, will not be presented at the WWW2005 conference but might be of interest.
+ A Personalized Search Engine based on Web-snippet Hierarchical Clustering by Antonio Gulli. Note: You can check out the engine described in the paper here. A new personalized version is also available. Antonio Gulli is the Director, Advanced Products at Ask Jeeves. His personal homepage is home to lots of interesting reading and demos including this one for ComeToMyHead, a news search tool (more than 2000 sources) that also offers images, personalization, and classification. In other words, what I'll be checking out this weekend. (-:
+ Pagerank Increase under Different Collusion Topologies (AIRWeb Workshop) by Ricardo Baeza-Yates, Carlos Castillo and Vincente Lopez.
Posted by Gary Price at 1:06 PM | Permalink
A week ago, Chris blogged about the First International Workshop on Adversarial Information Retrieval on the Web that will be part of the WWW2005 Conference next month in Japan.
One of the papers that will be presented at the conference: Web Spam Taxonomy, by Zolta Gyongyi and Hector Garcia-Molina from the Stanford Database Group has been updated and is now available full text (9 pages; PDF) online.
It's a very interesting read.
From the abstract: Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures.
Posted by Gary Price at 7:09 PM | Permalink
Spotted via InsideGoogle, Stephan Spencer argues in Does Google deserve a top 10 spot for britney spears? that Google is borderline spamming to come up in the top results for britney spears on its own site. I disagree. The content is relevant, nor are the changes suggested necessarily a solution to this "problem." C'mon along for a journey into the issues at hand.
One solution Spencer suggests is that Google should tweak its algorithm to favor pages with more "topical relevance" about Britney? Sounds reasonable, but in practice, not so clear cut.
What's Relevant For Britney
The fact that so many people don't know how to spell Britney's name IS relevant to the topic of her. In fact, it's long been a talking point that MSN Search used to use to highlight its relevancy.
By the same argument, the Britney Spears Guide To Semiconductor Physics should be dropped from the top results. Believe me, having watched this query over the years, that site is a long time rank holder on Google and elsewhere that has nothing to do with Britney other to use her as part of an educational parody for explaining semiconductors.
When the physics site first started ranking well, I felt the same way as Spencer. What's the deal with this non-Britney page being there? But it is sort of related, in that her fame has extended into people using her for parodies.
Similarly, I don't see that Spencer's installed any type of meta robots tag or robots.txt file to prevent his article about the Google-Britney situation from itself ranking well on Google. So when he says:
In the meantime, I think it would be in good form for Google to add a rel="nofollow" href attribute to the Britney Spears link on their Job Opportunities page and let some other, more relevant Britney fan site have that #7 slot.
Then the same should apply to his article. Shouldn't he be blocking his content from being indexed, to ensure some more relevant Britney fan site isn't bumped out if you somehow start ranking well?
How About Showing Some Topics, Not Pages
The reality is every search on any search engine will have some irrelevant results. Ideally, what you'd want for a popular and broad query on Britney is to get a better classification of types of results you can see: official sites, fan sites, sites about her film career, Britney as a part of popular society and so on. Since everything has some relevancy, such groupings help ensure you get into a particular area related to Britney that you're interested in.
For example, consider if you searched on Yahoo Directory, where you could see all directory categories like this:
See how the "topical relevancy" of all things Britney is divided into four major areas? How about the 208 topics that Clusty finds, which include:
Sadly, the demise of human-powered directories on major search engines has all but killed such categorization from really being show to searchers. But what about Ask! It clusters! It groups. Yeah, but sometimes not very well. Here's what we get for Britney:
Sure, everything may be related to Britney in some way, but that's a far cry from actually grouping and refining topics that are specifically about her.
Did Google Really Make This Happen
How about Spencer's claim that "the sheer weight" of Google's own link from its job page to its page about Britney misspellings gave that page a top ranking. Hard to say.
Google lists over 100 pages that are linking to that page, such as The Guardian mentioning the page about Britney or this site commenting on the page back in 2003. Google, of course, doesn't show all the links it knows about. So heading over to Yahoo, we see there are nearly 2,000 pages linking to that page, such as this one from Wired back in 2002. Google has certainly indexed some of those links that Yahoo has also found, even if it doesn't show them.
I have no doubt Google's own link helped. But it also links on that same page to its Google AdWords page with the words "advertising products." But when I search for that on Google, I don't get the AdWords page. Why not? Because the sheer weight of that link on that page doesn't appear to be weighty enough.
As for the page being a "dead-end" for users, I agree with Spencer here, in as much that given that the page is obviously getting visitors, it could be made more useful to those interested in Britney but who don't want to work at Google. And sure, maybe Google should add a nofollow link for the PR value in saying it's trying to minimize its own impact on search rankings. However, I think that's a difficult path to follow.
Overall, I'm going to end up hoping that if a page is deemed so irrelevant by Google searchers, they'll tell Google directly via the "Dissatisfied? Help us improve" link at the bottom of every search result page.
Disagree and perhaps think Google is indeed spamming itself? Well heck, they've banned themselves from cloaking before: Google Admits To Cloaking; Bans Itself. You can use the Report a Spam Result page at Google to report the page.
Of course, the page might easily return at any point, if Google feels whatever was in error has been fixed. Google released WordPress's home page from its penalty after less than a day. But from what I can tell, Google's own page that was banned remains so nearly a month after it was penalized.
Posted by Danny Sullivan at 10:21 AM | Permalink
You know local search is maturing when local merchants are ensuring that clients give them positive online reviews. As Local Search Grows, So Does Temptation To Post Shill 'Reviews' from MediaPost takes a look. I think it's hardly "semi-guerilla marketing" at all for merchants to ask to be reviewed. That's commonplace for a lot of online sellers at eBay, Amazon and elsewhere. The concern is really more about whether companies will fake good reviews for themselves and bad ones for their competitors. Of course, they will eventually, and it will be interesting to see how local search players respond.
Posted by Danny Sullivan at 10:02 AM | Permalink
Back in January, blogging software provider WordPress was one of several vendors that signed on to support the new nofollow attribute designed to stem blog and search spam. That's why it was so ironic when it emerged yesterday that WordPress has been spamming search engines itself.
There's quite a debate that has since emerged over whether WordPress was really spamming and if so, should it have been deemed OK because the aim was to help support the open source blogging platform that many bloggers use.
Let's clear up the spamming question right away. This was spam of the search variety. As I've written before, the search engines themselves are the ultimate arbiters of what's search spam. Google has declared the pages so here in comments from GoogleGuy (and yes, Google confirms to me it was the real GoogleGuy):
There definitely appear to be hidden links on the root page of wordpress.org using CSS, e.g. "text-indent: -9000px; overflow: hidden". That's clearly against our quality guidelines at http://www.google.com/webmasters/guidelines.html#quality
What's more, it looks like the company responsible for doing this (hotnacho.com) is also responsible for creating duplicate content in the form of posting the articles in multiple places, as you can see with this url: http://tinyurl.com/3omjj (these duplicate pages probably won't last long).
Yahoo says the same in the Wordpress Article Spam Being Removed post from Tim Mayer, Director of Product Management for Yahoo Search:
We are in the process of removing the WordPress article spam.
Wordpress Website's Search Engine Spam from Andrew Baio at Waxy.org broke the news yesterday of how he discovered nearly 200,000 pages of low quality content designed attract people from search engines and hopefully get them to click on Google AdSense ads, generating revenue for the site. A screenshot helps explain the situation more:
This is the top of one of the pages in question. You can try to view it yourself here, but there's a good chance it will be removed shortly. Most of the other pages have been removed from the site.
I've added all the colored boxes. The big red one at the top highlights the AdSense ad on the page. That's the goal -- get someone to come to this page from a search engine, then hope they'll click on one of those ads (or the four that were at the bottom of the page). Do that, and the site earns money.
On first glance, the content doesn't sound bad. It does give you basic information about mesothelioma. But it's like junk food content, not really saying anything of real substance that fills you up.
More important, the act of hosting all these pages shows all the fingerprints of content designed primarily to attract search engines, rather than to please humans. Note the relatively high repetition of the word "mesothelioma," a sign that the page is trying to do well for this term. Notice how the word "mesothelioma" is always a link, as I've illustrated with the blue boxes. That's an attempt to help search engines believe the pages being linked to are about that word.
Most important is the fact that this page has no relevancy to be on this site at all. The WordPress home page gave human visitors no idea that hidden within its bowels was a resource area about mesothelioma. Instead, the site seems to be all about the WordPress software itself. This content was not being openly promoted to visitors. That's because it was instead hoped that it would be found only by search engines themselves.
How did the search engines get to find the content? Down at the bottom of the WordPress home page were (and still are at the time of this writing) these hidden links:
Sponsored Articles on Credit, Health, Insurance, Home Business, Home Buying and Web HostingThey were hidden through the use of a style attribute that kept them from being seen by anyone using a fairly modern browser. But a search engine sees things generally like old-style browsers, which means the links were visible. You can see an example of how this was so by making use of the Lynx Viewer here to imitate how a search engine crawler might have viewed the page.
As the links weren't hidden to search engines, they found the special "articles" area of the WordPress site -- http://wordpress.org/articles/ -- and indexed the content inside there, thousands and thousands of pages.
If all these fingerprints weren't enough to tell you that the site was involved in trying to grab search traffic, you need only look at the topic being targeted. Advertisers regularly pay extremely high per click fees to rank well for "mesothelioma," because attorneys hope lawsuits involving this cancer will bring high settlements. The top spot for that word is currently going for $52.08 per click on Overture right now.
Indeed, as I've written before, the high earnings that ads for that term can bring is one reason another blog site recently started up, specifically to generate content that's hoped will earn money off mesothelioma ads. The author of that site was upfront about his motivation, and the content is certainly better than the junk food search fodder hosted on WordPress. But nevertheless, as I wrote, the quest for AdSense money in that case created new content we might really not need and which possibly might push out better content from top search listings.
So back on WordPress, the content in question was spam. We don't know actually whether it was successfully bringing in search traffic or not, much less AdSense reviews. No one I've seen has posted any top ranking examples for these pages -- and now that both Google and Yahoo have removed the pages, it's even harder to check. I did a few queries last night on things like "mesothelioma" and "coping with mesothelioma" and didn't spot them ranking well. Nevertheless, with nearly 200,000 entries in the search engine lottery, they probably pulled some traffic related to that term or for a myriad of other topics that were targeted, such as "web hosting" or "diabetes."
The person who leads WordPress, Matthew Mullenweg, turns out to be traveling at the moment so hasn't been able to respond to the current debate. We do have his response from when questions about the content were first raised on a WordPress support forum thread back in mid-February, however:
The content in /articles is essentially advertising by a third party that we host for a flat fee. I'm not sure if we're going to continue it much longer, but we're committed to this month at least, it was basically an experiment. However around the beginning of February donations were going down as expenses were ramping up, so it seemed like a good way to cover everything. The adsense on those pages is not ours and I have no idea what they get on it, we just get a flat fee. The money is used just like donations but more specifically it's been going to the business/trademark expenses so it's not entirely out of my pocket anymore.
An Innocent Mistake? Hard To Believe
Some have argued the statement above suggests Mullenweg didn't realize this content would be seen as spamming the search engines, nor apparently that hiding links would be a no-no, either. Perhaps, but you'd think he would have had some inkling they might not like this. He'd already signed on to the nofollow comment spam fighting initiative. You'd expect he'd make some connection that doing funny things with links might be seen as bad by search engines.
In addition, last month WordPress was part of a web spam summit that was held, also described here. Since that summit covered the problem of "fake weblogs" or "spam blogs" designed to capture search engine traffic just to make money, you'd think some similarity between those and these pages would have rung a bell. True, these pages weren't blog posts. Still, they had many of the same basic goals behind having fake blogs.
However the content got there, innocent mistake or not, two major search engines have deemed the content spam and removed it from their indexes. That doesn't mean the WordPress site has gone, however. All that appears to have been specifically removed are the spam pages.
The WordPress home page does appear to have been penalized at Google, probably as a result of the hidden links it had and still has. The home page no longer shows a score in the Google Toolbar PageRank meter, whereas yesterday it scored a 8 out of 10. That's almost certainly a penalty that's been applied. But other pages in the site still have high scores, such as the About page, so this isn't a site-wide penalty.
Also yesterday, a search for blog software I did brought the home page up in the top 10 results on Google. Today, it's not in the top results. That's another sign that a penalty has been applied to that page. In fact, a search for WordPress itself doesn't bring up the home page on Google (it does on Yahoo still, and it was first on Google last night).
That's something that won't last. It hurts Google's relevancy for people not to get the WordPress.org home page when they do a search for the company (WordPress.com which is now first appears to be run by someone other than WordPress). After a short period of time, WordPress's home page will undoubtedly find its ban lifted. After all, do a search for WhenU, and you get that company's home page tops in Google despite it having been banned for cloaking last year. After 42 days, it was back in.
I've seen some comments worrying that because the WordPress home page has been penalized, anyone using WordPress might be banned on Google or Yahoo. That's not a concern, I'd say. This isn't an issue as with the SearchKing case where people using WordPress might be seen as part of a network of sites to be penalized.
On the flipside, plenty of people running WordPress now have links from their blogs to the site. Is WordPress now a "bad neighborhood," something search engines say not to link to lest you be penalized. Possibly, but I doubt it.
If you want to be absolutely safe, then ironically make use of the nofollow attribute. It never was going to be a complete solution to comment spam, nor has it been. But as I wrote before, it is a perfect way to link to other sites without worry that you'll be penalized by doing so with search engines. More about this in my past article, More On Link Condom & Blogger Worries Over Nofollow.
Spamming, But For All The Right Reasons?
The links to WordPress are also fueling a debate over whether those who have done so to show their support have now been duped. I'll leave that for those in the WordPress community to argue. I've used WordPress, liked it, have recommended it before and still recommend despite what's happened. It's good software. But that doesn't entitle it to some of the excuses I've seen some make on its behalf, to justify the spamming.
Just because WordPress is an open source project, asks for donations and needs more support doesn't entitle it for a free reign to spam search engines, "experimentally" or not. If it wants to spam, then it pays the same price anyone else pays if they want to be aggressive with search engine optimization and get caught breaking rules.
Given this, seeing a comment like this really annoyed me:
Hot Nacho is a company that supports open-source software, specifically WordPress. All the web geeks need to remember that there are worse companies out there than those that try to "screw with Google" for PageRank, etc. It's fun to say "spammers are scum" and I certainly don't like them, but get some perspective, there is worse evil in this world.
All that said, I don't have a big problem with what Matt did, he said it wasn't something he wanted to do long term, but if it could help bootstrap the community it would be nice.
Search engine guidelines against spam don't say something like, "Don't spam us, unless you're just trying to make a start and help other people, then it's OK." They don't say, "Spam us for a little bit, then you can stop when you've earned enough." They say don't spam, period. If you don't want to follow those rules, fine. That's a risk you can take, and others do as well. But don't expect to be let off for free, if you're caught.
Here's another comment I disliked:
I don't begrudge someone earning money from something they have put a great deal of effort and time into. Particularly when it seems to be putting back into the product and to the benefit of the community.
Well, I do begrudge someone earning money if it's screwing up the quality of my search results. Fair to say, the searching community (anyone who searches for information) is a little broader than the WordPress community.
Misleading Spam, An Important Tangent
I've written about search spam many times and generally try to cover various viewpoints and illustrate how tricky defining what "spam" is can be. But as said, my view is the search engines are the ultimate arbiters of what they consider spam, for banning purposes.
Beyond that, we individually decide what we consider spam. I come across search spam all the time -- which to me is irrelevant content that's overtly attempted to get a good ranking. I dislike it immensely when I hit this type of content, because I know exactly what the person has done to be misleading. Here's a recent example.
I wanted the phone number of a chicken place near our home. I typed in king chicken amesbury into Google, then saw this promising "Amesbury Business Directory" page in the top listings. The page wasn't a real directory at all. It came up because it was generically designed to work for a variety of cities and topics. All these cities were named on the page:
Amesbury Box Bradford-on-Avon Calne Chippenham Corsham Cricklade Devizes Downton Durrington Hawthorn Highworth Malmesbury Market Lavington Marlborough Melksham Mere Pewsey Ramsbury Salisbury Sherston Swindon Tisbury Trowbridge Warminster Westbury Wootton Bassett
They worked in combination with what were called keywords related to Amesbury:
1 litre 1 ton 100% funding 16 18 22mm 16 bit 1-6 people 16bit 1880 clu 2 litres 2 post 2 ton 2.4ghz 200 kilos 24 hour 24 hr 3 day 35mm aps 3d 3g models 4 post 5 to 1 5:1 sf 500 kilos 50s 5-1 sf 60s 68 briefs 6ixty 8ight 6mm to 25mm 7 8 9.5 mm 7 day 7 day opening 70s 76 cm 7650 games 91cm 99cm a la carte ab1 ab2 ab3 abrasive abs pp academy acce access control accessories accident accidents accommodation account accountancy accountant accountants accounting accounts accurate acerbis acrylic acrylics acryllic acton adams adapters additional address adhesives admin administration administrative adsl advance adventure adverse adverse debt advice advice advisor adviser advisers
I'm not printing the entire list. It goes on and on, and the few lines above make the point. This wasn't a relevant page. This did nothing to satisfy my query. This was simply created in hopes of getting me to the page and clicking on some of the ads. It wasted space in Google, and it wasted my time.
So let's bring it back to WordPress. The content it had existed solely to make money, not really to inform or help. It took away space and resources for good content I'd rather see. At least sophisticated spammers would have ensured that if they got a top ranking, I would have been delivered to something with far more useful content. That's just a prerequisite to ensure people don't end up reporting your content as spam.
Yeah, Your Search Spam Did Contaminate!
Another comment that caught my eye was this:
Spamming is unsolicited. All of these posts are on a sanctioned area of WordPress and don't exist anywhere else. It'd be different if these posts were dropped into blogs and wikis all over the place but they aren't. Linking them in off-screen content is a little bit of trickery but there isn't any leeching there.
It's similar to what Jonas Luster of WordPress argued here:
Let?s get the first response over with - please, please, please stop calling it "Spamming". Regardless of how you stand towards the deeper issue at hand, diluting a word by mixing pretty much everything into the basket of spamming is not a good idea. Yes, the postings were made to improve the Google rank of someone else, yes there was a financial transaction involved, and yes, the postings were not topical to the wider sense of the site, but it's not spam. Spam involves other, involuntary, carriers. No comment boxes were contaminated, no mailboxes, no Usenet forums, and certainly no one spent a single byte of extra bandwidth (with the exception of the links from Wordpress.Org) on it. It's not spam.
Honestly, statements like that are simply frightening. Spam isn't only something that happens if you drop comments or trackbacks on blogs. Neither is it some new term we've suddenly co-opted for SEO. I've personally been using it since I started writing about search engines in 1996.
Push misleading or irrelevant content into a search engine overtly just to get traffic, and I call that spam. Break the rules a search engine sets out, and they call that spamming. The search industry has been using the terms "spam" and "spamming" for nearly a decade. Heck, even legal cases have cited spam in relation to search engines. Trying to redefine the term as it applies to search to put a better spin on the situation at WordPress isn't going to help things.
But it didn't really hurt anyone! That's sort of the tone of this unreasonable justification:
Matt could have put out announcements asking for donations. He could have plastered flashing advertisements all over the WordPress sites. He could have used every available opportunity to "pass the cup". Instead he chose an avenue which was out-of-sight. And instead of perceiving this as "polite", people have chosen to view it as "sneaky". "Et tu, Brutè?"
I see. It was polite to get nearly 200,000 low content pages into the search engines, where they consumed crawler time in being found and regularly revisited, time that might have been spent on other pages. It was polite that people hitting these unsolicited pages via search engines wasted time having to go back and seek again the solid information they really wanted. Thanks for that. Next time, just put the ads up on some real content. Or yes, do tell people you need money.
In the end, the big deal really isn't that WordPress was caught spamming. People get caught for spamming all the time. But we have never, ever had a situation I can recall where someone was caught spamming at the same time they were supposedly working with the search engines to prevent spam!
The creation and rallying of industry support around the nofollow attribute was unprecedented. We never before had any unified effort among search engines in that way to fight spam, much less having other parties like WordPress cooperate with that. Yes, nofollow was designed to combat link spam. What WordPress did was content spam, a different tactic. But the aim of both tactics is the same -- get more traffic from search engines by trying to aggressively manipulate them. WordPress ultimately did the very thing it was supposedly fighting against. That was a very big deal indeed.
For more on this, check out Wordpress gettin' Slammed for Spamming? at Threadwatch, for a lot of good comments especially from aggressive SEO types. Over at InsideGoogle, WordPress Caught Spamming has some nice links at the end to a collection of comments from various blogs on the situation. Want to comment or discuss what I've written or the situation in general? Then join the forum thread I've started, WordPress In Spamming Uproar.
Postscript: As mentioned, Mullenweg is currently on vacation and generally without internet access. He's posted a brief note here on having just now seen the concerns over the spamming, and further posts will likely eventually follow on the home page of his blog.Postscript 2: In mid-April, I heard from Mullenweg on some questions I sent across to him. He responded that he hadn't realized what was presented to him as "advertising" was a form of "web spam," saying:
My mindset in terms of spam is very focused on the type I deal with and fight on a daily basis, I did not think of things in terms of what search engines such as Google deal with because I've never been in that position. I'm not going to argue semantics, but that sort of artificial content, hosted or otherwise, is not something I would ever participate in again.
Hidden links were also an issue. Mullenweg said:
They were wrong and shouldn't have been done.
He added the hidden links himself.
I never got an answer to the final follow-up question of why the links were hidden in the first place.
Posted by Danny Sullivan at 9:16 PM | Permalink
Update: Adversarial Information Retrieval WorkshopThe First International Workshop on Adversarial Information Retrieval on the Web workshop, to be held at the 14th International World Wide Web Conference (WWW2005), 10-14 May 2005, Chiba, Japan, is shaping up to be a promising event. If you're thinking of attending you might want to register today; early registration rates end March 31.
A number of prominent researchers have already submitted papers to the workshop, at it looks like it's going to be a great anti-black-hat event. Read on for a full list of accepted papers.
Full Papers
- Blocking Blog Spam with Language Model Disagreement Gilad Mishne, David Carmel and Ronny Lempel
- Cloaking and Redirection: A Preliminary Study Baoning Wu and Brian D. Davison
- Pagerank Increase under Different Collusion Topologies Ricardo Baeza-Yates, Carlos Castillo and Vincente Lopez
- SpamRank -- Fully Automatic Link Spam Detection Andras A. Benczur, Karoly Csalogany, Tamas Sarlos and Mate Uher
- Web Spam Taxonomy Zoltan Gyongyi and Hector Garcia-Molina
Synopses
- An Analysis of Factors Used in Search Engine Ranking Albert Bifet, Carlos Castillo, Paul-Elexadru Chirita and Ingmar Weber
- Bringing Down the House: An Analysis of Optimal Link Bombs Sibel Adali, Tina Liu and Malik Magdon-Ismail
- Web Spam, Propaganda and Trust Panagiotis T. Metaxas and Joseph DeStefano
Posted by Chris Sherman at 4:44 PM | Permalink
Search Marketing Techniques, Deceptive Advertising Laws & Other Laws from Alan Perkins at Search Engine Guide looks at how laws about deceptive advertising might be applied to search marketing. Alan's long argued that cloaking could be considered deceptive advertising, and he tries to build that case here -- the deception being that the search engine itself was being deceived about the real relevancy of a page.
He cites the FTC action over a pagejacking scam in 1999 as one extreme example of deception being found in a legal instance. I agree with that (and my own write-up of that case is here, FTC Steps In To Stop Spamming). Alan does make clear that search spam itself is not necessarily the same as deception from a legal perspective. But he does conclude specifically that cloaking content with the intent of getting a better ranking is deceptive advertising:
So, those search engine spamming techniques that involve delivering the same content to searchers and search engines, such as hidden text or single pixel transparent links, do not constitute deceptive advertising. However, those techniques that involve delivering different content to searchers and search engines constitute deceptive advertising if the intent and result of the technique is a preferable placement.
I completely disagree. First, I don't know that getting organic listings in a search engine would be considered "advertising" under US laws, much less those of other countries. In addition, if what was promised in the search listing is generally the same as what someone gets when they arrive at the page, it's hard to argue consumer deception.
But the search engine itself was deceived! Maybe, but that doesn't mean laws about deceptive advertising were violated. And search engines get deceived about things all the time, including when they naturally fail to index pages properly or assign them a better ranking because the page themselves are not necessarily search engine friendly.
In fact, that's one reason that Google itself allows approved cloaking, as I've written before. Without allowing this, it can't properly index some content.
It's also why I find the entire argument over cloaking to be so tiresome to the point I may no longer even comment on articles about it in the future. Cloaking is not necessarily spam or misleading, as I wrote to great depth in my Ending The Debate Over Cloaking article of Feb. 2003.
If cloaking alone (independent of WHAT is being cloaked) were spam and misleading, then Google wouldn't allow it all all, in any circumstances, nor would Yahoo and others that accept XML feeds allow that form of cloaking. Cloaking is simply a method of feeding content to a search engine. How that content is described to a consumer and what ultimately is delivered when they arrive at a page after reading a listing is where you determine deception.
Did you promise "kids internet games" as with the 1999 pagejacking case and instead deliver up porn? That's deceptive, regardless of whether you cloaked, meta refreshed or whatever. Did you promise games and actually deliver them? Then how you gained the listing isn't likely deceptive from a legal point of view. Deception in getting the ranking will remain the sole jurisdiction of the search engine itself (and more about that in my past Spam Rules Require Effective Spam Police article)
Later, I'll be writing about new page-specific markup that Yahoo is proposing that were raised at the Indexing Summit we held at SES New York (for some fast details, see our Indexing Summit - SES NYC 05 forum thread with live coverage of that). This markup would allow portions of a page seen by humans to be ignored by spiders -- effectively, a form a cloaking.
There are good reasons for doing it, but if the change comes, it's going to once again move forward the definition of cloaking. More important, it's going to further move forward the fact that search engines are no longer (and haven't for some time) only comparing pages to each other that have been spidered exactly as seen by humans. They aren't, nor should they, and nor would doing so somehow restore some type of "level playing field" that never existed in the first place.
Want to discuss? Please join our forum thread, Deceptive Advertising in Search Results.Posted by Danny Sullivan at 7:20 AM | Permalink
Adam Penenberg offers a profile of WebGuerrilla's Greg Boser in the Wired article: Search Rank Easy to Manipulate. While I'm sure many of you will plenty to say about the article, allow me to share a few random thoughts that came to mind from my non-SEO, professional researcher (librarian) perspective.
+ This article reinforces my belief (one that I've had for many years) that verticals (what I called specialized search tools) are going to be grow in importance and use by researchers. I'm not saying that general purpose and large web databases aren't important, THEY ARE, but from a searcher perspective bigger doesn't always mean better. I also believe that along with verticals, tools to federate results and help with database selection will also be key.
+ Penenberg's article focuses only on manipulating Google results. When it comes to PageRank/link analysis, most other large web engines use link analysis and many other criteria in one form or another to determine relevance but manipulation is not just an issue at Google.
+ Some type of human intervention might still be needed in compiling, maintaining, and editing large web databases. Google and other large web engines are often compared to libraries. In my opinion, this is an inaccurate comparison. Sure, they both provide access to information but all libraries have selection policies about what they do and don't provide access to. In other words, humans (not only librarians) make decisions based on a variety of criteria including currency and accuracy of the info and reputation of the author and publisher. Collection development is an essential part of librarianship. Many of the great libraries are also focused on one or more topics, disciplines, or types of information. Yes, library collections can be verticals too!
+ The need for better user training. This might not be something for everyone but certain groups of web searchers (educators, students, etc) could benefit from just a small amount of training to learn the many pros and cons of large web engines, what verticals/specialized tools can offer, and when using any search tool (Google, Yahoo, Ask Jeeves, or various specialized databases) how to take full advantage of the technology. A little training can go a long way to help the searcher create more precise and focused result sets.
More about these topics in future posts.
Posted by Gary Price at 1:05 PM | Permalink
USA Today looks at how Google's AdSense program has grown to help publishers make money: Google's AdSense a bonanza for some Web sites. I get to be one of the voices not seeing things as so rosy.
For one, I note that AdSense is an area where Google goes off its core mission of organizing the world's information. In other words, AdSense doesn't help you in your search quest. That's nothing new to my readers. I said the same thing when the program launched in 2003.
Google's response in the story isn't convincing. If I do a search for the New York Times and see an ad offering a discount, that's not because of AdSense for Content -- that's AdSense for Search.
Huh? If you missed the whole AdWords changes into AdSense metamorphosis last year, my More On Mixing Contextual & Search Spending post explains that more. When I say AdSense, I still mean the AdSense contextual ads -- as do most people I talk with, including Google's own advertisers and publishers.
Chris Pirillo's quote does a better job of explaining the traditional Google argument of how AdSense helps search. But funding publishers, they can make better content. That increases the odds that if we look for things, we'll find what we need.
To me, it's a stretch. By that argument, Google ought to be giving away free web hosting, paying people directly to write content and other things. OK, so Google's Blogger is a form of free web hosting and Google Answers does pay people to write content. I stay steadfast that AdSense still isn't part of the core mission of organizing information. It's about extending the ad reach Google has, so it can earn a lot of money given that there's not enough search inventory to go around.
That's not a bad thing -- it's just is the biggest thing you can point at if you want to say Google isn't all about search, for a company that as I wrote back in 2003, painted itself into that particular corner.
Meanwhile, AdSense turned Google into usurping Amazon from having the web's largest affiliate program. Before AdSense, blogs and other content sites mentioned in the story would have depended often on Amazon links for a paltry pay-out for the work they do. Now Google is the major moneybags -- which brings along another major problem, spam.
The irony is deep. Google, by paying publishers, fuels an incredible amount of search spam -- pages that are simply created with no other purpose other than to get search traffic, show AdSense ads and make the site owners money. The story addresses this, and Google responds that it does try to stamp it out, but the problem remains.
Somewhere between existing sites and spam is something like Michael Buffington's asbestos blog experiment. He decided to build a blog around asbestos because of the large amount of money advertisers after mesothelioma victims are willing to pay.
I don't know the area, and maybe it will evolve into a useful new resource. But was it driven out of a need for searchers or out of a desire to make money? Buffington's honest:
The second part of this big experiment is to see if I can capture some of that click through revenue while still providing a somewhat valid service to people who might arrive by search results.
So thank you AdSense and Google. You've directly inspired the creation of content that maybe we didn't really need but which wants to earn off your search results. In some cases, that will be good. Believe me, as someone who started as a self-publisher, I love that AdSense is out there to help others. But in some other cases -- I suspect a lot of them -- the content isn't going to be that great.
Meanwhile, I've written earlier that Yahoo wants to compete with Google in the space more heavily. More fuel to those rumors (more like confirmed open secret) from News.com: Yahoo seeks to expand in Google territory.
Posted by Danny Sullivan at 1:40 PM | Permalink
You know about PageRank and about two weeks ago I mentioned a new paper from Stanford's Database Group discussing PeopleRank. Today, another paper posted on the Stanford server. This one introduces TrustRank that has been developed to help fight web spam. Here's the abstract:
Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.Another version of the paper was published in March 2004. The full text of the paper: Combating Web Spam with TrustRank is available as a 12 page PDF. It was co-authored by Zoltan Gyongyi (Stanford), Hector Garcia-Molina (Stanford) and Jan Pedersen (Yahoo!).
Posted by Gary Price at 12:31 AM | Permalink
Here's a pair of Search Engine Watch Forum threads dealing with some tricky SEM business issues. In How To Start & When You Can Charge For SEO, members discuss how to get started with SEO/SEM services. What level should you be at in order to confidently charge clients. Meanwhile, Sued for Blackhat SEO? looks at whether "black hat" SEO or really any SEO activity that backfires to a client's expectations could get you into legal hot water.
Posted by Danny Sullivan at 12:03 PM | Permalink
Although most search engines post spam policies, there are many gray areas and ambiguities that can trip-up inexperienced site owners making their first attempts at search engine optimization. What's more, even the search engines themselves acknowledge that certain categories, such as online pharmaceuticals, are dominated by cunning optimizers who can blow away their unsophisticated competitors, often without being detected.
How do you know if you're over-stepping the boundary into the shadows of spamming? It's not so much a matter of the techniques you use, as your intent, according to a recent Search Engine Strategies panel. In today's SearchDay article, What, Exactly, is Search Engine Spam?, guest writer Bill Hunt continues our coverage of sessions devoted to better understanding this most vexing issue for site owners and searchers alike.
Posted by Chris Sherman at 3:47 AM | Permalink
Creating a link farm, or a collection of web sites for the sole purpose of pointing to one another to boost search engine rankings, is a well-know (and widely discredited) search engine spammer trick. But some businesses have legitimate reasons for maintaining multiple sites and multiple domains. A brand-owner may want to have separate sites for individual products, for example. A merchant offering widely divergent products may want to operate different specialized storefronts. These are just a few examples of sites that can get inadvertently hit with spam penalties—not for doing something wrong, but for straying into a gray area where search engines might look askance at their tactics.
In today's SearchDay article, Avoiding Search Engine Woes with Multiple Domains and Websites, guest writer Grant Crowell covers a recent Search Engine Strategies panel that explored these issues and offered solutions to potential problems. A longer version of this story for Search Engine Watch members goes into more detail about situations that can inadvertently trigger spam penalties, such as inappropriate cross linking between sites, using global templates, and geo-locating blunders, along with solutions to these and other problems. Click here to learn more about becoming a member.
Posted by Chris Sherman at 3:00 AM | Permalink
Earlier, I posted about the Link Condom site that went up, which pokes some fun at the new nofollow attribute. Six Apart's Anil Dash didn't find it funny, as he deconstructs in Anti-Nofollow FUD. Instead, he interpreted as a blog spammers attempt to discredit the attribute.
News flash for anyone who doesn't yet realize it. Nofollow will NOT stop blog spam. Want to understand more about why? See my previous article on it, Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links.
The site is more about the fact that there's a host of other non-blogging issues that the tag raises that people will want to be aware of. So it's not anti-nofollow. If anything, it deserves a little praise for helping people easily understand a concept of what nofollow does -- prevents links from actually touching another site for search ranking purposes. Link condom? Great name -- because that's what nofollow is.
I posted a long response to Anil's worries on his site, but I'll reproduce those comments below for my readers. In addition, I'd encourage everyone to look at some of the discussion within our The New Nofollow Link Attribute forum thread especially for a non-blogger view of the attribute.
Also look at Anil's The Social Impacts of Software Choices, which looks at how bloggers are now wondering if the impact of nofollow will hurt how they link between themselves for search purposes. I have a long response I posted to that, as well. The main bit that struck me was the comment that some bloggers worry nofollow will hurt their chances of ranking well when they comment on other blogs:
There's also some resistance from real bloggers, who are fretting now that their comments won't confer PageRank on their blogs.
To which I responded:
This sounds very much like bloggers with an SEO complex. I need links for ranking? How about you write good stuff, and people will comment on it within their own posts that will help -- not that you need to be able to comment behind a post and get respect that isn't necessarily earned.
Again, see that post for Anil's full look, my comments and responses for others. And below, my full comments on the non-blogging issues about nofollow that Link Condom highlights:
Anil, the site's a light-hearted joke. Believe me, Todd Friesen who threw the site up isn't trying to spread FUD about nofollow through that site. It's more an inside things for those who know search engines and are talking about the issues of nofollow OUTSIDE blogging. Want a taste of that? Then check out this thread at our forums that goes into the non-blogging issues more.
No time for that? Then let's go back and look at the main points highlighted on that page about "uses" of nofollow:
Where's the mention of comment spam in those? The word "blog" isn't on the page once and "comment spam" is down in small text as a joking aside. If this were a rant against nofollow being useless at combating comment spam, why bury it like that?
Answer? Because it's not a rant on nofollow as it relates to blogs. It's a joke having fun at the issues of nofollow that those OUTSIDE of blogging are contemplating in the wake of the tag. I'll take up some of the bigger points and explain them:
Hoarding: Some people want to get tricky and not let anything outside their own web site get link credit. It's not a blog thing -- it's a link thing. Personally, I think it's a waste of time. But for those who do worry about it, nofollow gives them a nice, new approved tool to hoard link credit.
Hiding: Some people want to link out so search engines feel they have a "natural" site but don't really want to show those links. Nofollow may -- or may not -- allow that. It's a new thing they'll try.
Screwing: Well, some people swap links for reasons good and bad, and for reasons before we had blogs and even before the search engines did much with links. And that link swapping -- again, completely outside of blogs in many cases, may now be impacted. Because if someone links to you, they might not really link in a way that gives you search credit. If that's what you wanted, you'd better know they've put a "condom" in the form of nofollow around that link.
Buying Links: People buy and sell links outside of blogs, often times for reasons of getting better rankings in search engines. Nofollow means that you can now sell links but say to the world, "Hey, I'm not doing this to mess with search rankings." That's nice if you're a big site that might want ensure you aren't going to be tainted as some type of search evil-doer. Then again, if you are someone buying links and doing it for just search reasons, you'd better make sure you don't buy them with nofollow on.
Bad Neighborhoods: Google and gang will tell you not to link to bad neighborhoods. Do you know what those are? I don't -- they didn't publish a list along with that advice. Maybe it's a porn site. Maybe it's a link farm. Maybe a porn site like Playboy is OK though. And maybe you are some newbie web author freaked out that anything you link to might get you into trouble.
I know those people because I have to deal with their questions and worries after the search engines have unleashed the fear factor. So the point is -- are you freaked out? Hey, use this new link condom and you can link safely. And by the way, it's another non-blog specific issue. It has an impact on all web authors. It's actually a great tool for anyone to use.
Easier to use: Yeah, it is easier to use. You and Todd seem to agree on this. Having easy options is good.
Now, I know you've got generally a bad view SEO, that it seems populated with scumbags like the supposed scumbag behind this site -- and being on the sharp end of blog spam, its understandable. But let's get personal a moment about the scumbag in question for this site.
Who published it? Someone who definitely does black hat SEO, yep. Someone who does white hat SEO, as well. And someone who knows a heck of a lot more about how search engines work -- and how this tag will and will not have an impact -- than the vast majority of people out there.
Scumbag? Then Yahoo -- who joined you and the other major search engines on nofollow -- is hanging out with scumbags, because Todd and I and several others all had a nice dinner recently with key people from Yahoo's search team last month.
Oh, and Todd's good friend Greg? One of MSN's search champs that got invited up with a few months ago along with key bloggers that MSN itself talked about in its post on adding nofollow. Why invited? Because despite being black hat at times, he also knows search intimately.
Let's not leave out Google. Todd and Greg have better contacts with Google's search engineering teams than the vast majority of people. Why? Because those scumbags know search so well they're respected for it. That's why I myself have them talk on search issues. There's a lot to learn from them regardless of what hat you wear.
Now for some of your points:
Nofollow Gives Choices: Yep!. I love it for that. Bring on more choice with what search engines do and don't index, so people like Brad Choate don't have to cloak and violate search engine guidelines. Brad, cloaking. Yeah, my How About An Indexing Summit! explains why he ended up doing this without realizing it. And by the way, that also puts Brad on the same exact page as people like Greg and Todd, who feel like they should be able to control their content as fed to search engines as much as Brad wants to.
Rankings From Blog Posts Won't Be Impacted: Oh yes they will. Hey, Jeremy says he's just put nofollow on the link to Todd's site in his post about it. Hey, that's a ranking impact. If someone links to you (on a blog, in blog comments, in a blog post, on your personal home page whatever) and now uses nofollow, you aren't getting the credit for that link. That's their choice, and I'm glad they have it. And they've had it before, but not as easily as nofollow makes it now. But you'd better believe it will have an impact. Whether it's good, bad or very little remains to be seen. For most people with quality content, probably very little.
PageRank Is Not A Contest: Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh!
Geez, I beg and plead for you, someone with such high standing in the blogging community, to stop making such bad mistakes and spreading misinformation about search. Perhaps you should be forgiven, given that the PR-meisters themselves at Google often make the same mistake. But to clarify:
PageRank is how Google calculates the popularity of a page, based on looking at all the links across the web. But you don't want PageRank alone. Go on, search for "cars." Did you see Amazon come up? No -- because despite having incredibly high PageRank, it's not got anchor text with the word "cars" pointing at it. So PageRank does not equal how Google ranks pages in search results. It is one key component, with the other two being the link text itself and the words on a page itself.
PageRank is also a Google-specific thing. Nofollow has an impact with ALL three major search engines participating, so talking about PageRank just reinforces the notion that it's all-Google or nothing world, when it is not. In fact, Ask Jeeves is specifically not supporting nofollow at the moment because they use a radically different ranking system that they feel might not be impacted by blog spam, link spam, link bombing and so on.
What you're really saying is that search rankings are not supposed to be a contest but instead be an objective decision of a mix of factors that the search algorithm uses to determine what's relevant. And it's a nice goal, but it's not true.
Even if we had no blogs -- no SEO -- no spammers, search algorithms wouldn't get it perfectly right. That's because people still make unintentional mistakes, create non-search engine friendly sites, rely on graphics rather than text, Flash rather than text and a host of other issues that ensure there's no such thing as a "level playing field" on the web. That's also, by the way, where plenty of SEO firms that you'd like come into play. They can help clear up many mistakes that the search engines themselves suggest fixing.
As for being a contest, search rankings are indeed one. And PageRank specifically itself is definitely a contest. Remember, when Google talks about counting links as a key component of what it does, it talks about relying on the web's "uniquely democratic nature." Democracy -- that's a popularity contest. In fact, to quote Google:
Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."
Not a contest? If it's not a contest, then what are all those votes being counted? Maybe nofollow will help ensure that we don't have a lot of chads polluting the election, but then again, maybe not.
What is clear is that nofollow will NOT stop blog comment spam. Not at all. Don't believe it? Then right now, all bloggers can stop making use of blacklists, registration schemes and other tactics used before nofollow emerged. Sit back and see if the spam goes away. It won't. Nofollow is a nice new tool that we can use, one that as I've said many times before is welcomed for giving us choice and more options, but it's not a magic bullet. Well, it's a magic bullet for one thing. It now lets the search engines say to bloggers, we gave you want you wanted, stop blaming us for the problem!
Posted by Danny Sullivan at 6:22 PM | Permalink
In the first cooperative move for nearly ten years, the major search engines have unveiled a new indexing command for web authors that they all recognize, one that they hope will help reduce the link and comment spam that plagues many web sites, especially those run by bloggers.
The new "nofollow" attribute that can be associated with links was originated as an idea by Google several weeks ago and pitched past MSN and Yahoo, as well as major blogging vendors, gaining support.
The Nofollow Attribute
The new attribute is called "nofollow" with rel="nofollow" being the format inserted within an anchor tag. When added to any link, it will serve as a flag that the link has not been explicitly approved by the site owner.
For example, this is how the HTML markup for an ordinary link might look:
<a href="http://www.site.com/page.html">Visit My Page</a>
This is how the link would look after the nofollow attribute has been added, with the attribute portion shown in bold
<a href="http://www.site.com/page.html" rel="nofollow">Visit My Page</a>
This would also be acceptable, as order of elements within the anchor tag makes no difference:
<a rel="nofollow" href="http://www.site.com/page.html" >Visit My Page</a>
Once added, the search engines supporting the attribute will understand that the link has not been vetted in some way by the site owner. Think of it as a way to flag to them, "I didn't post this link -- someone else did."
By the way, should you be one of the few using other types of rel attributes within your links (a way to show the relationship between your page and the page you're linking to), Google advises that you should separate them with spaces.
For example, Google cited this page, which provides one example of multiple rel attributes in action, like this:
<a href="http://jane-blog.example.org/" rel="sweetheart date met">Jane</a>
If you wanted to add nofollow to the existing one, you'd just put a space between it and the other attributes of sweetheart, date and met, like this:
<a href="http://jane-blog.example.org/" rel="sweetheart date met nofollow">Jane</a>
Google also said upper or lower case is usage of the attribute is fine and that the creation of this new attribute is believed to meet W3C standards on markup, as they allow for anyone to create new attributes.
Causes Of Link Spam
Why would you want to use the attribute? Blog publishers, forum operators, sites with guest books and others who allow anyone to contribute in some way to their web sites have suffered when people have used these systems to spam them with links.
For search engine purposes, getting a link to your site from someone else's site can serve as a "vote" that your site is seen as good. In Googlespeak, getting a link increases the PageRank value of your page -- sometimes a tiny bit, sometimes much more.
In addition, getting a link may help better ensure that your page is indexed by the major search engines. Finally, getting a link with words you want to be found for embedded in the anchor text can help you not just be seen as popular but also help you rank better for particular words.
Here's an example of comment spam in action. I did a Google search for texas holdem comment to find some candidates and focused on this page as an illustration. From PoliPundit.com, it's a blog post from Nov. 2002 about a political development.
Below the post is the comment area. The area has been link spammed heavily -- 30 entries containing links to web sites promoting casinos, poker, dating and other topics, like this (I've removed the links):
http://www.-texas-holdem-poker.us holdem poker texas holdem poker Comment by texas holdem poker | Email | Homepage | 12/26/2004 - 12:31 pm
Your blogg is smashing! Payday Loans http://www.payday-express.com Comment by Payday Loans | Email | Homepage | 1/15/2005 - 4:04 am
Your blogg is full o information. HGH http://www.hgh-express.com Comment by HGH | Email | Homepage | 1/15/2005 - 12:40 pm
Great article and great website. I wish you could update if more frequently. You?re also welcome to visit my websites: Checks, Cigarette, Dating, Honda, Insurance, Las Vegas, Lawyers, Lexus, Online Poker, PDA, Toyota.
It's not just a Google problem. Do a Yahoo search, an Ask Jeeves search, or a search at MSN Search. All bring up examples of pages that contain link spam, which have been indexed by these search engines. As a result, they also might find their ranking systems impacted by the activity.
Google, nevertheless, often gets the blame -- which is why it was under the most pressure for coming up with something for the problem. The hope is that by allowing web authors to flag links in this manner, it will make blogs, forums, guest books and other places accepting contributions less attractive to spamming.
What Nofollow Means
Below I'll cover what Google says it does, if it sees a link with the nofollow attributed associated with it. Yahoo and MSN are likely to react in a similar fashion, though I haven't yet spoken with them to get exact details since news of their support only just emerged.
If Google sees nofollow as part of a link, it will:
That's the situation at the moment. Google is going to evaluate how the attribute works, and it could decide to make other changes down the line, it says.
Now let's look at the impact of each action:
1) Not following the link to the page it points at means that potentially, Google might not index the page at all. As said, the more links that point at a particular page, the more likely it is that Google (and generally the other major search engines) will include that page within its index.
The nofollow attribute DOES NOT mean that someone will prevent a page they do not actually control from being indexed, however. If Google finds even one ordinary link pointing at a page, it may then index that page.
In addition, people can submit their pages directly to Google (and most major search engines). So it's crucial to understand that just because someone might place nofollow in a link pointing at your site, this WILL NOT prevent your page from getting indexed.
2) As for PageRank calculations, it's important to remember that PageRank is a pure popularity score (other search engines have similar scoring mechanisms, just not catchy names other than Yahoo's Web Rank). The nofollow attribute means that a link will not be counted as a "vote" in this popularity contest. That can have an impact on ranking, in cases where the impact of other factors beyond pure popularity come into play.
Huh? Say there are two pages, one with a PR score of 6, the other a PR of 7. Even though the PR7 page is more popular from a link counting point of view, it could still get outranked by the PR6 page if other factors such as the words on the page, or the anchor text pointing at the PR6 page, make it more relevant for a particular search.
It's also important to note that nofollow DOES NOT mean you are flagging a link as being bad in some way. Google isn't going to say, "Aha -- nofollow is on this link -- that's a bad link." Or as Matt Cutts, a Google software engineer who helped develop the attribute, said:
"It doesn't mean that it is a bad link, or that you that you hate it, just that this link doesn't belong to me."
Instead, nofollow effectively will cause Google to ignore the link, to pretend it doesn't exist. This also means you shouldn't worry that people will link to you and use nofollow as a way to hurt you -- Google says that won't happen.
3) This leads to anchor text. Generally much more important in ranking well for a particular term is to get the words you want to rank well for put into a link that points at you. With nofollow added to a link, Google won't associate the anchor text in the link with the page the link is pointed at. This, more than anything else, will sour things for link spammers.
Stop Spam? No. A Start, Yes!
The new attribute won't stop link spamming. Many people may still spam simply because they hope human beings will see the links, click through and perhaps convert. As with email spam, maybe only an incredibly tiny number will do so. But since there's no heavy cost to the spamming, that might still be enough.
In particular, much blog spamming is done through automation. So even with the new system in place, some of that automation will keep rolling along. It will no doubt even evolve to spot blogs and other areas that aren't making use of the nofollow attributes, just as smart spammers currently focus on blogs that have been abandoned, rather than irritating active bloggers.
This means other types of systems of blocking spam will likely still have to be used, such as forcing people to input characters from graphics (captchas), registration and so on (The Solution To Blog Spamming at ThreadWatch has a nice rundown on these, and also see Six Apart's Guide to Comment Spam).
While link or comment spamming isn't going away, it's still heartening that it will be less attractive. Site owners have been given an important new tool that lets them control indexing -- something they've not had offered for years. Perfect or not, I'm glad it's emerged.
Vendor Support
Google started developing the idea of a nofollow attribute several weeks ago and quietly shared it with a number of the major blogging vendors. Many of them have now signed on, pledging to support and implement the tag in the future, if they've not already done so now.
As a result, those using systems provided by one of the major vendors such as Blogger or Movable Type (see here for support news) should find that implementing the tags to be associated in links in comments is a matter of flipping a switch. OK, maybe clicking a radio button or drop-down box! Google provides a list of those supporting it here.
Google said it will soon begin talking with other companies, such as those that making forum software. But makers of these packages or any packages could implement support when they are ready.
Uses For The Attribute
The tag can be used by anyone anywhere, of course. It's not just for use with blog comment areas or forum posts. For example, Cutts said people might use it if they publish dynamically generated referrer stats and visitor information.
"Wherever it means that another person placed a link on your site, that would be appropriate," Cutts said.
Because of this, some page authoring tools will likely add support in the future, if it is widely adopted as will likely be the case. Some tools may allow adding it right now -- and those who know HTML code can do an easy insertion.
That might be handy if you need to link to a site but are worried that a search engine might consider it a "bad neighborhood," as they've often described them. In reality, the chances are very small that the typical person might link to a site that would actually hurt them with a search engine. But if in double, nofollow could offer peace of mind.
Of course, those who are swapping links with other sites now have a whole new thing to look out for. If someone offers to link to you, you'll want to make sure they don't make use of the nofollow tag -- at least if you were hoping for some search engine gain. Otherwise, the link's not going to count.
Don't forget -- there are other good reasons to still get links even beyond search engines, of course. My Golden Rules Of Link Building article covers this more.
You definitely DO NOT want to use the attribute on links to your own pages. Do that, and you'll deprive your own pages from the chance of influencing how your other pages rank.
Having said this, I've no doubt some people will try playing with the new tag as a means to "hoard" PageRank that's passed on to only a few pages in your site. For example, your home page might link to 25 of your internal pages. Using the new attribute, you could exclude all but five of these pages. Do that, and you might possibly cause Google to give those five pages more credit (see the Link Building & Link Analysis article for Search Engine Watch members for more about this).
Maybe. Perhaps. And perhaps the search engines may make other changes down the line. Rather than get tricky with this tag, I'd recommend using it as intended for now -- as a means to flag that there are certain links on your web site that you didn't place there.
Support From Other Search Engines
How about the other search engines? MSN and Yahoo are onboard. In fact, Yahoo beat Google out of the gate in blogging its support of the new tag first. A Defense Against Comment Spam offers a few details, an example and news that the change will be implemented in the coming weeks.
As for MSN, Working Together Against Blog Spam explains how the company made a snap decision today to support the tag, though the idea was something it had considered during its Search Champs meetings with bloggers and search marketers several months ago. It promises that its crawler will begin respecting the attribute in the coming weeks.
Google, of course, has been onboard from the start. It provides more details on its blog in Preventing comment spam.
So how about Ask Jeeves, the remaining major crawler? They're still looking at the new option and weighing it up.
"We'll consider it for the future, but because we use local [link] popularity and not global popularity, we are not going to rush into anything today. It has more impact for Google and Yahoo because of their similar methodologies. The upside for us is much more modest," Lanzone said.
By local popularity, Ask Jeeves is referring to how its Teoma search engine will calculate the popularity of pages and do ranking only after culling a subset of pages deemed relevant, rather than looking at all links from across the entire web. My Make Room For Teoma article explains this more.
More Info
Google To Add "Nofollow" Tagging Of Links To Fight Spam? is where I explain more about how the news of the new attribute emerged, plus provides some background on the difference between it and the nofollow attribute of the meta robots tag.
Comment Spam? How About An Ignore Tag? How About An Indexing Summit! is my post wishing for an "ignore" tag similar to what's emerged here and how others have been wishing for this even longer.
It also looks at how it has been literally years since we've had an advancement in the type of indexing control given to site owners. This new attribute -- whether you love the idea or hate it -- is welcome move for at least giving site owners themselves some choice in the matter.
The New Nofollow Link Attribute is a thread in our forums where you can discuss the new attribute.
Posted by Danny Sullivan at 8:47 PM | Permalink
Yahoo, MSN Join Google In Supporting NofollowWhile we wait for Google to post official notice of its support for the new nofollow attribute, Yahoo's already chimed in on its blog that it will do so as well. And apparently, the Google announcement may come here, as Yahoo is already linking to it. MSN tells me directly it also will support the tag, and plans to post on its blog as well. As with Google, Yahoo's linking to where that will likely show up. Ask Jeeves tells me it is still considering the tag. More to come in a long story I'm about to post!
Postscript: Support has now been officially announced by everyone. See the Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links for further details.
Posted by Danny Sullivan at 7:03 PM | Permalink
Confirmed: New Google Nofollow Link Attribute Is ComingRobert Scoble has posted confirmation that Google will introduce a new link attribute. OK, then I'll confirm it as well -- I've been told the same by my contacts at Google. Since official confirmation has now been leaked out, I see no need to hold back.
As Robert notes, the information is supposed to come out later today on the Google Blog. What will be the new attribute? Well, I could say "wait and see," but Dave Winer already leaked that part out. He didn't say it came from Google (it did), but he provided enough clues and follow up confirmations for people to know this is the nofollow attribute that will be introduced.
Exactly how Google will interact with the nofollow attribute remains to be seen. I'll be posting a follow up with those details. For background on it, see my Google To Add "Nofollow" Tagging Of Links To Fight Spam? post.
Postscript: Support has now been officially announced. See the Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links post for more.
Posted by Danny Sullivan at 2:06 PM | Permalink
Shame On You: Tsunami Search SpammersFrom Silicon.com, Tsunami scammers manipulate Google rankings explains that an alleged phishing site is ranking higher on Google than the actual China Charity Federation web site, potentially causing people donating to tsunami relief to send their money to the wrong place.
The site in question, www.chinacharity.cn.net, is still ranking tops at Google despite the web site apparently having been closed down. The site is also ranked first and second at Yahoo, third at Ask Jeeves but not at all at the MSN Search beta.
Kudos to MSN? Well, the official web site of www.china.org.cn is second at Google, tenth at Yahoo and Ask Jeeves but not in the first page of results at all over at MSN. So MSN doesn't send you to the wrong place -- but neither do you get to the right one.
FYI, the story reports that the real site is at www.chinacharity.cn, but that domain isn't working for me. My assumption is that the correct address is the one shown above.
Postscript: A reader tells me chinacharity.cn is the correct site.Posted by Danny Sullivan at 10:06 AM | Permalink
NOTE:Support has now been officially announced. See the Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links post for more.
Dave Winer posted a cryptic Watch This Space post yesterday, pointing at a page that many have interpreted to mean that Google will be providing support for a "nofollow" attribute that can be added to links.
For example, the HTML code for an ordinary link might look like this:
<a href="http://www.site.com/page.html">Visit My Page</a>
HTML specs (3.2, 4.0, XHTML 2.0) allow for links to have additional information associated with them. The rel attribute is designed to allow authors to express particular relationships about the current document to the page it is linking to.
In Winer's post, he makes use of a nofollow rel attribute in links that appear in the comments of his post, such as like this:
<a href="http://www.site.com/page.html" rel="nofollow">Visit My Page</a>
The speculation of those who spotted this (see Robert Sayre with a supportive comment from Dave, Simon Willison, others) is that Google will be providing support of the nofollow attribute in some way to help combat comment spam on blogs (and by extension, anywhere people may find publicly-contributed links to cause problems).
Why Google? In the past, Dave has suggested that comment spam is a Google problem -- and earlier this week, he also posted a note saying he'd heard from a the "only" that could solve a "big" problem on the internet.
What might the nofollow attribute do? The closest thing we have to it at the moment is the nofollow attribute for the meta robots tag. That attribute is a way to tell search engines not to follow links from a page they may have found.
It's important to note that the attribute was intended for site owners who wanted to prevent search engines from indexing other pages they link to from within their own sites, not as a mechanism for preventing the indexing of pages of sites outside their control. No does it allow this. If there was another way to find a page (on the site owner's site or not) -- and if the page itself is not blocked somehow from being indexed -- then it would still get listed.
So a nofollow attribute associated with a link itself isn't likely to prevent the page the link points at from being indexed. After all, search engines will likely find those pages in other ways, and those pages probably won't have spider blocks placed on them.
Instead, a nofollow attribute is likely to be treated as an "ignore" or "don't count" flag. It's the way for a web author to say, "I don't care about these links -- nor should you."
How might Google react to it? That remains to be seen. It might decide not to index the link at all -- so it wouldn't record the text of the link, nor the fact that the link points at another page -- depriving that page of a possible PageRank rise. Or, it could decide to index the information but not weight it as heavily.
Whatever the case, it won't stop blog comment spam -- nor other types of link spamming across the web. But it's a start, and more important, it gives authors more control over their pages. I'm all for that.
My main disappointment, should the mechanism emerge, is that it would have come unilaterally from Google. Despite what Dave thinks, comment spam is not a Google problem. It's a search problem in general, and it would be nice to see the search engines work together to solve the wide range of issues that web authors (not just bloggers) have.
More on this in my past Comment Spam? How About An Ignore Tag? How About An Indexing Summit! post, where I talk about the idea of an "ignore" tag or more important, an indexing summit to discuss publisher needs and controls. We're doing that at our SES New York show, by the way. I hope to get some search engine reps to come hear and discuss what publishers of all types are looking for.
Also see Nick's Rumour - Google About To Kill Comment Spam post and comments at Threadwatch. I've chimed in along with others about what might happen, how it might fit in with things and what may or may not work. Also some nice thoughts also from Peter Van Dijck and a summary from Steve Rubel.
Postscript: We've also got a thread going on the topic now in our forums, where you can comment or discuss: Discussion on 'Google To Add "Nofollow" Tagging' blog
Posted by Danny Sullivan at 8:51 PM | Permalink
Nathan Enns of FyberSearch dropped me an email to say he saw my proposal for search engines to consider an ignore tag and implemented it for his own FyberSearch search engine. More details and instructions in the press release at his site. OK, so FyberSearch is a tiny search engine, and the command is specific to it. This action isn't going to stop the problems bloggers and other publishers have. But it's a nice start!
For more background on the call for search engines to consider new tools for publishers, see my Comment Spam? How About An Ignore Tag? How About An Indexing Summit! post. Discussion is also on-going in this forum thread: Time For An Indexing Summit? I share within it that I'll likely set-up a summit-like panel for our next SES show in New York.
Posted by Danny Sullivan at 7:56 AM | Permalink
Getting Free Of Search SpamLearning From SEM Blunders at ClickZ has PJ Fusco looking at extracting sites from actual or potential search engine spamming situations, based on her own experiences. For a similar article, see Bungled Search Engine Optimization - Cleaning Up the Mess from a panel at one of our SES conferences.
Search Engine Watch members should also see the SEO: Spamming category of Search Topics for an annotated guide to stories on spamming for SEW and around the web over the years.
Posted by Danny Sullivan at 7:35 AM | Permalink
For our Search Engine Watch members, I've posted a long Talking About Search Engine Spam article that draws comments and observations that came out of two sessions at our recent SES Chicago show.
Are white hatters naive by not being aggressive with SEO? Are black hatters unethical and subverting search quality by going too far with search engines? The answers are never as simple as they'd seem, as the article explores.
Overall, one of the key points I felt that came out was that tactics have to be appropriate to the space you are in. Yes, you could use black hat tactics to get a top ranking for some relatively non-competitive queries. But do that, and you stand out like a sore thumb. As I note in the article, STO -- sore thumb optimization -- is to be avoided.
The article also looks at the myth of the top ten results being the most relevant results out there, as well as the sad state of how spam continues to be defined by particular tactics, rather than intent and end result.
Those seeking more background about spam should see the Search Engine Spamming article available to SEW members, as well as the SEO: Spamming category of Search Topics for an annotated guide to stories on spamming for SEW and around the web over the years.
Posted by Danny Sullivan at 12:12 PM | Permalink
Comment Spam? How About An Ignore Tag? How About An Indexing Summit!Bloggers seem increasingly upset at the comment spam they have to deal with, something driven primarily by those who seek higher search rankings by posting links to their sites into comment areas.
To me, the solution seems simple. Why not give designers a tag telling search engines to ignore portions of a web page? Or better yet, how about a coordinated summit among search engines and webmasters to advance the state of site indexing overall?
The solution would help more than bloggers. That's good, because more than bloggers need it. The problem bloggers face has already been an issue for those who run forums, guest books or any other type of venue allowing public contributions. All are -- and have been -- targets of those who want to promote web sites.
For a non-blogger perspective at the problem, check out Mike Grehan's Google PageRank Lunacy article we ran last year in SearchDay. It discusses how guest book spam spoiled a memorial site for a good friend of his. Just like bloggers, people with guest books need help too.
I take my inspiration for an ignore tag primarily from Bruce Clay, who proposed a somewhat similar idea for <ad> tags to Google informally earlier last year. Bruce's concern was that if he or others want to purchase links, they don't want those links to harm them somehow in search engines.
Believe it or not, there are some people who buy links because of the traffic the links themselves may drive. Bruce's thought was that if publishers such as Search Engine Watch's own JupiterMedia could surround paid links they sell with an ad tag, then search engines could discount those links for ranking purposes.
Interesting idea. I also like the idea for another reason. Since we've operated our Search Engine Watch Forums, we've been liberal about allowing people to link out to resources as relevant. But this can and has been abused. Not much, fortunately, but we occasionally have to police out the irrelevant link or the link hidden in a period or comma.
One solution would be an <ignore> tag. Using this, we could surround any posted links with the tag to prevent them from being indexed. If that became commonplace on forums, it might reduce the attraction for link spam to them.
That leads to another inspiration. Six Apart/Movable Type's Brad Choate wished for some type of page-based ignore feature last July in his Restricting Google on my terms post (something he originally asked for back in Feb. 2002). His solution, which he didn't realize when doing it (check out the comments of that post) was to cloak his pages using user agent detection.
Google, of course, doesn't like cloaking. But since Brad's intent isn't too deceive Google, chances are he's not going to get busted. But even more to the point, as he says, he wouldn't have to do such a thing if Google gave him some alternative.
More broadly, lots of people beyond bloggers in lots of situations wouldn't have to do such things if search engines gave us more options. It's not a Google thing. It's not blogger thing. It's a search indexing thing.
I mentioned the ignore idea to Yahoo at our SES Chicago show and got some interest, so maybe there's hope. It poses problems, of course. An ignore tag could be abused. An ignore tag also means that some good content that's marked as "ignore" might not get indexed. But perhaps we might also have levels. How about a <content> tag authors can use to denote the key body content, a <nav> tag to highlight navigation search engines might not want to index or weight as heavily or a <public> tag to denote publicly-contributed content that might deserve less weighting?
There are lots of possibilities. What I know is that the last time the search engines came together to help provide coordinated assistance to web site owners on indexing was May 1996, when we got agreement on the meta description and meta robots tags, along with some additional talk on new support for the robots.txt convention.
Since then, we've had unilateral advances such as AltaVista (new image indexing tags), Google (robots.txt expansion, no archiving tags) or others have added but nothing coordinated to involve web site owners or the search industry as a whole. After nearly 10 years, surely the time is ripe for that type of cooperation now.
At the very least, it might help get some bloggers off Google's back who blame it for the problem. A sampling of blame and other looks at the problem and solutions:
So what do you think? Time for an indexing summit? Are there indexing changes you'd like to see? Comments of any type? Come discuss in our forum thread: Time For An Indexing Summit?
Postscript: Support has now been officially announced for an ignore-like nofollow attribute. See the Google, Yahoo, MSN Unite On Support For Nofollow Attribute For Links post for more.
Posted by Danny Sullivan at 5:58 AM | Permalink
Patricia Fusco, one of the new SEM columnists at ClickZ, comes out swinging with an argument that if the firm you outsource SEO work to uses "black hat" techniques, they should get axed: Search Engine Spam? You're Fired! The caveat, of course, is this assumes you hired a firm and weren't aware they might use frowned-up techniques. Firing a firm that went aggressive with the search engines after they told you they would, or because you told them to do so, doesn't make much sense. You shouldn't have hired them in the first place!
Posted by Danny Sullivan at 3:45 PM | Permalink
Over on the Yahoo Search Blog, An Interview with Tim Converse touches on some of the things Yahoo does to shape relevancy and fight spam. A key part is coming up with a way to automatically classify documents across the web, include those that would be considered spam. But don't get too excited -- no secrets on what exactly makes up spam are disclosed.
By the way, those who need an eye-opening experience to what spam is should check out the What Is Spam? session at SES Chicago later this month. Tim Mayer from Yahoo is a veteran of that session from when we offered it several conferences ago and returns as part of the panel. It's designed for those who are worried they might "accidentally" spam a search engine.
Expect Tim and the other panelists to show you some you some egregious examples of spam, resting the minds of those who fear a simple mistake will get them banned.
Posted by Danny Sullivan at 2:22 PM | Permalink
Behind The Scenes Of Google's TechThe magic that makes Google tick from ZDNet has a look at technical details behind delivering Google searches. But, I've got a few quibbles:
OK, enough with the quibbles, and which in fairness I could do with Google competitors, as well. See the rest of the article for some technical details on Google data centers, the fact there's not been a complete system failure since February 2000 and more.
Posted by Danny Sullivan at 12:56 PM | Permalink
Pandia has a nice three part article on getting back into Google's good graces, if you've run into trouble: Help, my site has been banned by Google!
Posted by Danny Sullivan at 8:34 AM | Permalink
Several major French SEO firms have apparently had their sites and some of their customer sites banned by Google over spam accusations. All are members of the SEMA 7 search engine marketing group in France.
ZDNet provides a write-up in French here: French référenceurs "déréférencés" by Google (bad English translation via Google here). Le Journal du Net also has coverage about large firm Netbooster being specifically targeted: Netbooster déréférencé par Google. The English translation provided by Google is pretty garbled, but from what I can read, Netbooster denies any charges of spamming.
French search engine expert Olivier Andrieu provides his take on the events here: "Il ne faut pas blâmer les référenceurs sanctionnés" (English translation by Google).
Want to comment or read discussion of this topic? Visit a thread already going in our forums: Huge Google Delisting Of Major European SEO Companies.
Posted by Danny Sullivan at 5:35 AM | Permalink | Comments (0)