New Google Study Looks at How Web Pages Are Authored

We don’t have the time to offer an in-depth analysis at the moment (Danny or Chris might do so in the the future) but web page authors, content developers, seo personnel and others who would like a better understanding of how web pages are built, will want to take a look at a new study published by Google that was conducted in December 2005.

The complete report titled, “Web Authoring Statistics” is now available on the Google Code site.

Google engineers analyzed a sample of slightly over a billion document and extracted info about popular class names, elements, attributes, and related metadata.

Here are just a few thing’s that I noticed during a quick read:

Web pages use an average of 19 different page elements.
Top Top 5 Elements are:
1. head
2. html
3. title
4. body
5. a

at the bottom the list are
14. link
15. form
16. input

The HTML Element

  • The most-used attribute on html elements is xmlns, from misguided people using XHTML but sending it as text/html. They even (just) outnumber the people who specify the lang attribute!

The Meta Element

  • “[a] huge number of markup errors involving the meta element.”
  • Most common values from the “name” element are keywords, description, and robot.
    Quick comment. Danny pointed out as years ago the crafting and using the keywords meta tag is most often (when it comes to general web engines) not a good use of ones time.
  • “The Dublin Core people can take some comfort from the fact that although their keywords didn’t appear in the top ten chart above, they were quite well featured in the next few dozen. Here are the ten most used values, most popular first: dc.title, dc.language, dc.creator, dc.subject, dc.publisher, dc.description, dc.identifier,, dc.format, dc.rights. In fact the order maps relatively closely to the frequency of similar metadata in other constructs, like class names or rel values. Nice to know people are consistent!”

Link Relationships
“HTML has two link relationship attributes, rel and rev, which apply to the link and a elements. What values are most used?”

Top 5

  • link-rel=”stylsheet”
  • link-rel=”icon”
  • link-rel=”shorcut”
  • link-rel=”alternate”
  • link-rel=”next”

The a rel=nofollow appears at number six.

I’m just scratching the surface on the massive amount of stats and graphs this report provides. It’s a must for any person interested in web page authorship.

A tip o’ the cap to Michel for the news tip.

Related reading

Your new secret weapon for better and robust analytics
A summary of Google Data Studio Updates from April 2019
What defines high-quality links in 2019 and how to get them
On-site analytics tactics to adopt now Heatmaps, intent analysis, and more