Move Over Google Knowledge Graph, Here Comes Knowledge Vault

Author

Jessica Lee

Date published August 27, 2014 Categories

The Knowledge Graph is well known throughout the SEO community as a knowledge base that collects data from a variety of sources across the Web to enhance its search results. But will its technology be “old news” soon? Enter the Knowledge Vault, the brainchild of Google and a sort-of “Knowledge Graph on steroids” that could soon be a reality for search.

Per a report in New Scientist, the Knowledge Vault is different from the Knowledge Graph in that it doesn’t rely on “crowdsourcing” for its information. From the report:

This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far. So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.

Google has been alluding to the Knowledge Vault for some time. Take this presentation from Google in 2013. In that presentation, Google points out that the Knowledge Graph is fueled by Freebase.com, a community-curated database that merges many data sources; and while “Freebase is large,” the presentation says it’s “still very incomplete,” and “we need automatic knowledge base construction methods.”

The presentation then goes on to explain how Google is building a more complete knowledge base, and that’s through the Knowledge Vault, which pulls together a borage of signals. From the presentation:

There are many academic groups (e.g., CMU, UW, MPI) that have developed methods to extract facts from large text corpora. At Google, we have developed a similar system, except it is 10x bigger. In addition, we use “prior knowledge” to help reduce the error rate.

knowledge-vault

This paper published by Google goes into more detail about the concepts behind the Knowledge Vault, citing three major components:

Extractors: These systems extract triples from a huge number of Web sources. Each extractor assigns a confidence score to an extracted triple, representing uncertainty about the identity of the relation and its corresponding arguments.

Graph-based priors: These systems learn the probability of each possible triple, based on triples scored in an existing KB (knowledge base).

Knowledge fusion: This system computes the probability of a triple being true, based on agreement between different extractors and priors.

AJ Kohn of SEO and marketing consultancy Blind Five Year Old commented on the details outlined in the Knowledge Vault paper, saying it shows “the various ways in which Google is extracting entities and how they validate them using prior knowledge. The real value of the process outlined is in the combination of different methods of extraction and validation to produce a far more robust and reliable database of entities.”

He adds, “In short, Google’s found a way to build a larger and more reliable Knowledge Graph. Google is harnessing all of the entity applications and essentially triangulating the ones that make sense.”

What this means for the future of search optimization, marketers will have to wait and see. Kohn recently wrote about Knowledge Graph optimization, and says that there are some takeaways in the details of the Knowledge Vault information.

“If you dig into the details, there are some interesting takeaways. First is that the markup that websites are actively using (i.e., schema.org) are a very small part of total extractions and produce far fewer reliable entity facts,” Kohn says. “The vast majority instead come from Google crawling the HTML tree of sites – the unstructured data that they then turn into something sensible.”

Does that mean we should stop using structured data?

Kohn says no: “I believe it’s critical to do so if you want to participate in the future of search. But it may show that the markup isn’t ideal, that sites may implement it in self-serving ways or may not implement it correctly. Or correctly enough for it to be used as an entity fact.”

Another interesting takeaway, Kohn says, is the “people” part of this Knowledge Vault process.

“This is specified in the paper, and lends a bit more fuel to the idea that Google wants to have some sort of author or expertise-based ranking, but wants to do it through entities and not explicit markup.”

Finally, Kohn says, there’s an element of link prediction and “how Google is able to better flesh out entities by understanding what types of facts they should expect to find. For each person they expect to find potential facts such as ‘spouse,’ ‘date of birth,’ ‘place of birth,’ ‘gender,’ ‘parents,’ ‘children,’ etc. Being able to generate these link predictions and seek them out automatically is extremely powerful.”

What does this mean for the average search marketer? Kohn says that first and foremost, “entities are the way in which Google is going to improve search results for the foreseeable future.”

Secondly, Kohn says, “Much of this information is finding its way into Knowledge Panels. So those sites that can connect their content to the Knowledge Vault may be surfaced in Knowledge Panels in the future. “

More about:

Resources

Analytics The 2023 B2B Superpowers Index

The Merkle B2B 2023 Superpowers Index outlines what drives competitive advantage within the business culture and subcultures that are critical to success. It is the indispensable guide for B2B marketers to deliver world-class experiences and keep pace with the dynamic environment. Download Now
Analytics Data Analytics in Marketing

The ClicData survey found that various challenges exist that prevent organizations from achieving such gains. These challenges included inaccessible data formats and limited flexibility in displaying data in dashboards. Download Now
Digital Marketing The Third-Party Data Deprecation Playbook

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now
Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

The need for fraud prevention in the digital world is critical now more than ever. Why? Thinking about your own behavior, consider how you complete transactions and how this has changed over the last 5 years. Download Now

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Information

Follow us

Move Over Google Knowledge Graph, Here Comes Knowledge Vault

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Optimize Google’s new Interaction to Next Paint metric

Seven tips to optimize page speed in 2023

Three must-have GA4 SEO reports you can build in under 30 minutes

Is your SEO performance a dumpster fire? Here’s how to salvage it

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Google market pulse for search marketers

How much impact can visual identity have on your organic visibility?

Follow us

Move Over Google Knowledge Graph, Here Comes Knowledge Vault

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Get the Latestdaily news and insights about search engine marketing, SEO and paid search.

Resources

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Optimize Google’s new Interaction to Next Paint metric

Seven tips to optimize page speed in 2023

Three must-have GA4 SEO reports you can build in under 30 minutes

Is your SEO performance a dumpster fire? Here’s how to salvage it

The Search Engine Watch Top 5!

The ultimate 2022 Google updates round up

Google market pulse for search marketers

How much impact can visual identity have on your organic visibility?

Get the Latest
daily news and insights about search engine marketing, SEO and paid search.