The Google Spam-Jam

Author

Date published February 1, 2011 Categories

Industry

Now that people have pointed out that they’re seeing more spam than usual in Google, we’re all seeing it. Correct?

But is there actually more spam? Google doesn’t seem to think so, according to a recent blog post by a Google engineer.

Fact is, this is a problem that Google has had from day one and it’s not likely to go away anytime soon.

Google came into the search world with a “we can’t be spammed” battle cry and introduced the search engine optimization (SEO) world to PageRank. The battle has been raging ever since.

The Problem With PageRank

PageRank (an eigenvector centrality measure, to be precise) gives web pages a high score if they receive links from many other pages, but does so in a way that the credit received for a link is higher if it comes from a page that is already highly ranked.

PageRank is “keyword independent,” which means it allows Google to calculate the latter parts of the ranking score offline. So they are calculated ahead of time — and not at the time of the query itself (regardless of any keywords your web pages already have a PageRank score).

Because PageRank is so computationally intensive to calculate, it saves time if you only have to calculate it once. However, the downside to PageRank not being “keyword dependent” is that people may link to a given web page for a number of different reasons.

And this is where the problem lies: many pages may have a high PageRank for a reason totally unrelated to the search query at hand. Pages making reference to more than one topic, for instance (and many pages do) may be an “authority” on one topic but essentially irrelevant to another — and PageRank can’t distinguish between the two.

As a result, since day one of Google on the web, it hasn’t been unusual for end users (and more so SEOs) to find a highly ranked page in the search results, even when it’s obviously irrelevant to the search topic.

Skewed Search Results

Even before the current murmurs, a large fraction of bad search results have always been included. Pages that are important in some context — yet not in the context of the specific search query.

So, it’s no wonder PageRank gets a slight demotion and HillTop creeps in (around 2003).

Co-citation can skew results. A query for something specific such as [the beatles” isn’t too difficult for a search engine to discover and rank an authority result at number one.

These bibliometric cues which are loud inside lists can alter results. It’s beyond the scope of this article to answer why, but lists can seemingly force Google to provide a combined on-topic-off-topic results page.

Try a search for something less specific than [the beatles”, like [newspapers” for instance. The result is going to be different depending where and when you search.

Look at the screenshot to see what I mean. The top ranked results are actually lists that receive a lot of in links (authority pages on a non-specific subject). You’d expect with such a query to at least see a result set of prominent newspapers.

The New York Times isn’t even above the fold (and I’m based in New York). Relevant? Not really. But not totally irrelevant either.

Game Theoretic View

There have always been obvious weaknesses to be exploited at Google.

In this kind of environment the perfect ranking function is always likely to be a moving target. The HTTP protocol and crawling the web for information discovery and indexing is wide open to spammers.

A couple of years ago at SES New York, Andrew Tomkins, then chief scientist at Yahoo, said something along the lines of: “As content becomes more diverse, more complex, bigger, and more fragmented… getting it through HTTP and HTML may not be the right model anymore.” That wasn’t specifically related to web spam, but it does address the entire problem of a process which no longer seems to be effective or scalable as far as web search is concerned, moving forward.

Of course, as Google has such a great understanding of user intent behind so many popular queries, they could simply filter out all commercial listings inside the organic results and leave them specifically for paid advertising. That would solve a huge chunk of the problem.

In fact, make all of the commercial listings inside the organic results video. It’s harder to spam that format.

Better still, don’t have any organic listings at all, bar a link to Wikipedia (which is what the organic listings frequently feel like anyway!).

By the way, Tomkins is now engineering director at Google. Maybe the future holds a whole different way of doing things at Google than to keep trying to plug holes in the old way.

Join us for SES London 2011, the Leading Search & Social Marketing Event, taking place February 21-25! The conference offers sessions on topics including search engine optimization (SEO), keyword analysis, link building, local, mobile, video, analytics, social media, and more. Register now.

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Analysis and advice: Why recipe sites saw huge fluctuations in visibility

Google's PageRank algorithm, explained

Search trends 2018: what can marketers learn?

SEW Interview: Clark Boyd on visual search

Where we’re going, we won’t need websites

Penguin 4.0: a link builder’s perspective

WebPromo's Q&A with Google's Andrey Lipattsev [transcript]

Google Now Adds Quote Cards That Lack Attribution

Follow us

The Google Spam-Jam

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Analysis and advice: Why recipe sites saw huge fluctuations in visibility

Google's PageRank algorithm, explained

Search trends 2018: what can marketers learn?

SEW Interview: Clark Boyd on visual search

Where we’re going, we won’t need websites

Penguin 4.0: a link builder’s perspective

WebPromo's Q&A with Google's Andrey Lipattsev [transcript]

Google Now Adds Quote Cards That Lack Attribution

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.