Dissecting Google Penguin 2.1: What Factors Mattered? [Study]

Date published 20 January 2014 Author

Jessica Lee

Categories

Dissecting Google’s Penguin algorithm has been a passion for many SEO pros since first learning about the update. Last year, U.K.-based MathSight used reverse engineering to identify which factors Penguin 2.0 was targeting on a website. More recently, MathSight crunched numbers to pull apart Penguin 2.1, and revealed additional clues about what this particular algorithm was after.

Prior to Penguin 2.1, Andreas Voniatis, managing director of MathSight, said it’s important think beyond the link when it came to Penguin by understanding the root cause.

“Many people forget that an inbound/outbound link profile originates from website’s pages,” he said. “So by analyzing the on-site SEO, we are effectively finding the stylistic properties of those external linking pages, which provides predictive value for Penguin 2.0.”

But that was 2.0, which MathSight said was all about targeting “low readability” levels of content on a website, specifically looking at body text, anchor text, hyperlinks, and meta information. So what about this time around with Penguin 2.1?

MathSight’s data showed websites that gained and lost traffic from Penguin 2.1 had links from web pages that contained:

A higher (good) or lower (bad) proportion of rare words in the body text.
A higher (good) or lower (bad) number of words per sentence in the body text.
A higher (good) or lower (bad) number of syllables per word in the body text.

MathSight’s data may support theories SEO pros have about linking to poor quality sites, and that “quality” factor is hindered on content, Voniatis said.

“The readability of content from a linking web page is highly influential to how Penguin views the destination site, that is, the site being linked to. Websites should eliminate links from sites that don’t meet the readability thresholds Penguin demands,” he said.

He added, “Readability is how Penguin cleans up the linking ecosystem on the premise that the more intellectual the text reads, the more authentic the content is likely to be.”

So how does 2.1 differ in the nitty gritty metrics?

“When we compared Penguin 2.1 to 2.0, we found the algorithm had been refined so that the readability metrics were more heavily weighted towards Flesch-Kincaid than Dale-Chall readability,” said Voniatis. “So it looks like Google is trying to find the limits of web spam by tweaking its readability formulas.”

Voniatis said the formula used to determine readability using the Flesh Kincaid scale was as follows:

RE = 206.835 – (1.015 x ASL) – (84.6 x ASW)

- RE = Readability Ease
- ASL = Average Sentence Length (the number of words divided by the number of sentences)
- ASW = Average number of syllables per word (the number of syllables divided by the number of words)

“The lower the score, that is, the harder the text is to read, the more beneficial content is for Penguin algorithm updates,” said Voniatis. “ANOVA (analysis of variance) statistics showed that the certainty of Flesch-Kinkaid causing a change in traffic due to Penguin was 99.999 percent.”

mathsight-penguin-2-1 The red bars in the graph above indicate those factors in sites examined that triggered Penguin, according to MathSight data. The green bars indicate those factors that websites had which benefitted from Penguin 2.1.

So what does all this mean, and what’s an SEO to do with the data?

Voniatis said the statistics “tell us the secret ingredients but not the reason why Google is using readability. I suspect Google finds readability an easy way of discounting links from guest posts written by non-experts.”

He added that SEO professionals could manually check each and every linking web page content for Flesch-Kincaid and Dale-Chall readability by using free online tools. But said MathSight’s API does this more efficiently by crawling links on- and off-site, evaluating readability and returning “a delta to the optimum readability threshold, so SEOs can disavow the links or recondition on-site landing page content.” And, he said, “The thresholds are updated with each algorithm update.”

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

Dissecting Google Penguin 2.1: What Factors Mattered? [Study]

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Optimize Google’s new Interaction to Next Paint metric

Seven tips to optimize page speed in 2023

Three must-have GA4 SEO reports you can build in under 30 minutes

Is your SEO performance a dumpster fire? Here’s how to salvage it

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Google market pulse for search marketers

How much impact can visual identity have on your organic visibility?