Penguin 2.0 hit in May of this year and was described by Google's Matt Cutts as more comprehensive and deeper than Penguin 1.0. What that meant at the time wasn't entirely clear. However, that wasn't good enough for UK-based MathSight, which attempted to deconstruct the algorithmic update from a reverse engineering process.
What MathSight claimed to find was that Penguin 2.0 was really about "low readability" levels of content on a site, and stated that its research was able to uncover with a 95 percent statistical confidence that Penguin targeted factors that include:
- Main body text
- Anchor text
- Meta information
"The insights came from analyzing websites affected by Penguin 2.0, which examined all onsite SEO aspects that led to a step change – positive or negative – in SEO traffic from Google," said Andreas Voniatis, managing director at MathSight. "It's worth mentioning that many people forget that an inbound/outbound link profile originates from website's pages. So by analyzing the onsite SEO, we are effectively finding the stylistic properties of those external linking pages, which provides predictive value for Penguin 2.0."
MathSight's research analyzed sites within specific industries like travel, gifts, mobile apps and jewelry, and corporate B2B companies including business awards, advertising and PR. Consequently, Moz looked at industries that may have been affected most by Penguin 2.0 and showed some similar sectors.
"We didn't set out to examine any specific theories or aspects of Penguin, for example, hypothesizing based on conjecture," said Voniatis. "Instead, we examined a broad dataset and allowed the MathSight platform to uncover mathematical patterns common to the pages that won or lost traffic from Penguin 2.0."
Voniatis said this resulted in "empirical evidence of patterns revealed about Penguin without any preconceptions or bias."
In its research paper, MathSight presented three case studies that attempted to explain what aspects of three different sites were rewarded or punished with Penguin 2.0. Data showed those punishments and rewards varied across sites.
The following is just one illustration of how MathSight organized the Penguin 2.0 analysis for a site:
"This would suggest that any advice given would be most effective if tailored to the individual domain, or type of domain if domains can be grouped or clustered into types," Voniatis said. "For example, with site B the presence of anchor text was seemingly punished as a feature, whereas for site C it was heavily rewarded."
However, MathSight did draw conclusions about how Penguin 2.0 worked overall using the data it found. First, MathSight said using "rare words" in the body text meant pages performed better in terms of traffic. Voniatis said the words they are referring to are not listed in the 5,000 most-common words in the English language. MathSight stated using these in the content would raise the readability level (he referenced specifically aiming for higher Dale-Chall readability scores).
MathSight also reported that longer title tags using words with more syllables tended to fare better.
From a content perspective, although we do have algorithms to deal with, becoming hyperfocused on details such as syllable count and the use of "rare" words could take marketers away from the big picture.
With that in mind, I asked Voniatis how SEO professionals should use the information in MathSight's report.
"The insights provided are 'technical' so they are not intended as a replacement for a highly skilled and experienced SEO. Whilst the MathSight platform can mathematically deconstruct search engine algorithms with supporting empirical data, it cannot provide the reason why – that's for Google to answer or for SEO professionals to theorize over," he said.
Additional conclusions from MathSight on the general behavior of Penguin 2.0 included:
- The use of headings are rewarded, and it's advantageous to use words that are less commonplace within them.
- The number of hyperlinks "appears" to be rewarded, meaning the more hyperlinks, the greater the increase in traffic, in some cases. MathSight warned that the data here could be too vague to take action upon, but did say that there wasn't any bias towards external or internal links.
- Depending on the type of site and based on MathSight's "limited survey," the presence and increased character length of meta descriptions and the increased quantity of words in anchor text are now slightly more rewarded than previously.
"SEO professionals can use the Penguin 2.0 study to explain to their clients that the update was about penalizing 'low readability' levels of content," Voniatis said. "SEOs can also use the findings to offer Penguin 2.0 recovery plans by identifying and responding to on-site and off-site content with poor readability scores."
Voniatis said MathSight provides data via an API to help SEO professionals examine which onsite pages need their content rewritten as a response to Penguin 2.0. From an offsite point of view, the data also scores and identifies which links to remove or disavow.