A special report from the Search Engine Strategies 2002 Conference, August 12-14, San Jose, CA.
Link analysis is one of the most important techniques search engines use to determine relevance, and understanding how it works is successful search engine optimization. Representatives from Google and Teoma explain how it's done.
If you have spent any time over the past few years studying search engine marketing you are probably familiar with the linking craze going on in the industry. Everyone from experts to those new to the field toss about terms like "link popularity" and "page rank" and it seems that all related discussion forums and web sites have entire sections devoted to linking. As the foundation of the web, links have always been important, but links themselves haven't changed much since the day they were created so why all the renewed interest?
The reason is that the major search engines are utilizing links more and more to improve the relevance of their search results. However, the world of links and their use by search engines can get confusing quickly. To help sort through the more important elements of linking the session "Looking at Links" was held at the Search Engine Strategies conference in San Jose, California. The search engines that utilize links the most, Google and Teoma, both sent representatives to explain why links are important to their engines, and how to best utilize them on a web site.
Daniel Dulitz, Director of Technology for Google, started things off by stating one of the more important points of the session -- as search engine indexes grow larger it becomes almost impossible to determine a web page's relevancy based solely upon on-the-page factors (page text, metas, titles, etc.). It's this fact, combined with the reality that most on-the-page factors can't be trusted due to abuse, that prompted Google to begin looking at the link structure of the web to help determine a page's relevance to a query.
According to Dulitz, when determining the relevance of a web page to a search they use their PageRank system to attempt to "model the behavior of web surfers" by analyzing the manner in which pages are linked to one another. He explained that Google views the interlinking of web pages as a way of "leveraging the democratic structure of the web" with links equating to votes.
Google essentially treats each link from one site to another as a vote for the site receiving the link (link popularity), but each vote is not created equal. Dulitz used a simple diagram to show that each page of a site only has one vote to give, so the more links to different sites on the same page the less of a vote each one receives. He also stated that links from higher quality sites carry more weight than those of lesser quality sites (e.g. sites with hidden links, involved in link farms, no incoming links, etc.). In addition, Google not only analyzes who is linking to whom, but they also analyze the text in and around the links to help determine the relevance of the pages receiving the links.
Paul Gardi, Vice President of Search for Ask Jeeves/Teoma, began with similar comments to those made by Dulitz. Gardi stated that "due to statistical convergence" and the ease with which they can be abused, neither page text analysis nor standard link popularity can be relied upon when determining the relevancy of a web page. Specifically, he mentioned that standard link popularity is ineffective because it does not help determine the subject or the context of the site, and larger more popular sites tend to overwhelm smaller sites that may actually be more relevant to a search.
To combat these issues Teoma views the web as a global entity that contains many subject based web site communities. They study these subject communities and the manner in which they are interlinked within themselves and with each other to determine not only their link popularity, but also the subject and context of the involved sites. According to Gardi, Teoma is able to do this by using their unique method of ranking sites. He explained that rather than relying on general link popularity to determine results, their engine attempts to employ a "subject specific popularity" to locate the most popular sites within a specific subject community. This is done by first analyzing the web as a whole to identify subject communities.
Teoma then employs link popularity within those communities to determine which sites are the "authorities" on the subject of the query and it's those sites that are returned as their results to a search. In addition, he mentioned that by analyzing the links of the authority sites their technology is also able to locate high-quality resource pages (links pages) that are related to the original query. Each of these components is then made available on their search results page as follows: "Results" are the authorities, "Refine" is the related subject communities, and "Resources" are the related links pages.
Overall, the session was well received and very informative, especially for those new to the subject. Considering that most major search engines now utilize some method of link analysis, anyone that has a vested interest in being properly indexed by the search engines should consider attending in the future.
Craig Fifield is Product Designer for Microsoft bCentral's Small Business Web site analysis and submission service, Submit It!
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.