Using Wikipedia to Improve Search Quality

Date published 9 July 2007 Author

Eric Enge

Categories

Industry

Bill Slawski put up an interesting post over the weekend titled Can Web Search Use Wikipedia to Understand References to Names?. Bill references a paper by Microsoft researcher Silviu Cucerzan. The gist of the paper is that search engines can use Wikipedia as a cross referencing source, to help a search engine understand when it sees a name like “Bush” in a document which Bush is being referred to (George W. Bush, his father, Reggie Bush, or whatever).

In principle, what the paper discusses is how the context of the use of a particular name in a web document can be compared to the context of the use of that name on Wikipedia. Simplistically put, if the reference to “Bush” appears on a site about the New Orleans Saints, the likelihood that it’s about Reggie Bush is quite high. The search engine can use an external reference source, such as Wikipedia, as a method of validation, but trying the various pages on Wikipedia with a last name of Bush, and noting the references in common.

For example, the Wikipedia page and the web page being analyzed probably both use phrases like New Orleans Saints, football, running back, etc. By developing this sense of context, the web page being analyzed can be more properly classified, even if the page never uses the running back’s full name. So if the user searches on Reggie Bush, the search engine will know that the particular web page can be considered as relevant to the query.

It makes for interesting reading, and provides some insight into the types of analysis that search engines perform. What makes this even more intense to think about is that this is just one example of thousands of such scenarios that search engines deal with. It’s a complicated process, indeed.

Industry

SEO

PPC

Analytics

Social

Local

Mobile

Video

Content

Development

Opinion

Information

Follow us

Using Wikipedia to Improve Search Quality

Leave a Reply Cancel reply

Resources

Analytics The 2023 B2B Superpowers Index

Analytics Data Analytics in Marketing

Digital Marketing The Third-Party Data Deprecation Playbook

Digital Marketing Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Resources

The 2023 B2B Superpowers Index

Data Analytics in Marketing

The Third-Party Data Deprecation Playbook

Utilizing Email To Stop Fraud-eCommerce Client Fraud Case Study

Related Articles

Solving the agency search intelligence gap

What to expect from SEO in 2021?

Search engine saturation: The ever evolving SERP and how brands are respond...

Q&A with Microsoft's Noël Reilly: Data, discovery, customer-first mindset

What's it like using DuckDuckGo in 2019?

Dragonfly: 500+ staff sign open letter for Google to drop new Chinese searc...

The evolution of search: succeeding in today's digital ecosystem - part 1

Search trends 2018: what can marketers learn?