IndustryUsing Wikipedia to Improve Search Quality

Using Wikipedia to Improve Search Quality

Bill Slawski put up an interesting post over the weekend titled Can Web Search Use Wikipedia to Understand References to Names?. Bill references a paper by Microsoft researcher Silviu Cucerzan. The gist of the paper is that search engines can use Wikipedia as a cross referencing source, to help a search engine understand when it sees a name like “Bush” in a document which Bush is being referred to (George W. Bush, his father, Reggie Bush, or whatever).

In principle, what the paper discusses is how the context of the use of a particular name in a web document can be compared to the context of the use of that name on Wikipedia. Simplistically put, if the reference to “Bush” appears on a site about the New Orleans Saints, the likelihood that it’s about Reggie Bush is quite high. The search engine can use an external reference source, such as Wikipedia, as a method of validation, but trying the various pages on Wikipedia with a last name of Bush, and noting the references in common.

For example, the Wikipedia page and the web page being analyzed probably both use phrases like New Orleans Saints, football, running back, etc. By developing this sense of context, the web page being analyzed can be more properly classified, even if the page never uses the running back’s full name. So if the user searches on Reggie Bush, the search engine will know that the particular web page can be considered as relevant to the query.

It makes for interesting reading, and provides some insight into the types of analysis that search engines perform. What makes this even more intense to think about is that this is just one example of thousands of such scenarios that search engines deal with. It’s a complicated process, indeed.

Resources

The 2023 B2B Superpowers Index
whitepaper | Analytics

The 2023 B2B Superpowers Index

9m
Data Analytics in Marketing
whitepaper | Analytics

Data Analytics in Marketing

11m
The Third-Party Data Deprecation Playbook
whitepaper | Digital Marketing

The Third-Party Data Deprecation Playbook

1y
Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study
whitepaper | Digital Marketing

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

2y