After the loss of Go earlier this year and the expected departure of NBCi, you might have thought it was all over in the search engine game. However, just as consolidation seemed inevitable, new player Teoma has stepped up with an impressive debut of its new search service.
Opened to the public last month, Teoma leverages link structures from across the web to provide not only relevant results but to allow present different views of information automatically.
Currently in beta, the site is primarily intended to demonstrate Teoma's technology to potential partners or buyers.
"We're in discussions with many of the major portals and also the major technology companies," said Paul Gardi, Teoma's president and chief operating officer.
The idea of running Teoma as an standalone search engine for the public hasn't been ruled out, and even though the current site is designed as a demonstration, it's already powerful enough that searchers may want to add it to their research arsenal.
Teoma is a crawler-based service and has a collection of about 100 million URLs. Of course, to be a serious contender in the search engine space, Teoma will need to grow, and it is planning to do so.
Database size is an important factor, but without good relevancy ranking, a large index isn't necessarily useful. Teoma hopes its own style of link analysis will give it the ability to take on the widely-acknowledged relevancy leader, Google.
To understand what Teoma is doing, it makes sense to summarize the Google system, first. Google examines link structures all over the web. By doing so, it can give every page a popularity rating known as "PageRank" (named after Google cofounder Larry Page). When you do a search, URLs with high PageRanks are more likely to be listed first. However, this will only happen if the pages also match other criteria, such as containing your search terms or being identified as being relevant to your search terms by analyzing the context of links.
Teoma operates in an opposite fashion. When you do a search, Teoma looks across the entire web to find pages that contain your search terms or which are considered relevant to those terms based on link context. After finding a matching set of documents, which it calls a "community, Teoma then examines the links between just this set, to determine which are the most popular.
"At the end of the day, we are ranking sites based on other sites that are on the subject," Gardi said. "We don't only use all the sites that are pointing at a site, we also use that are on the subject."
The implication is that Teoma's "community" generated results will be more relevant than those from Google or others that use a "global" system which examines the entire web, because links from irrelevant pages are excluded. However, this understates what Google does.
Yes, PageRanks at Google are computed from examining the entire web, but link context and the content of web pages are also taken into account. This is supposed to reduce the impact of "irrelevant" pages in Google's system.
"Topic specific PageRank versus general PageRank, I'm not sure how much of a difference there is," said Urs Holzle, a Google Fellow and the company's former vice president of engineering. "Suppose you search for something about ice hockey. The sites that come up, where are they getting their PageRank from? Most likely, other ice hockey sites."
Teoma also uses its link analysis system to create its unique "Expert Links" and the autoclassification of pages into topics.
Let's take Expert Links first. When you search at Teoma, a list of "Expert Links" appears along the right side of the page. These listings are pages that provide links to a wide range of resources on a particular topics. In other words, these are " link links" or " weblogs" for a particular subject.
Here's another way to think of it. If you go to Yahoo and search for something, you'll usually be lead to a matching category that that lists a variety of web sites on your search topic. Other people create these type of topic specific lists, and Teoma's Expert Links area is designed to help you easily find these types of resources from across the web.
Teoma's other special feature is the autoclassification of web pages. At the top of Teoma's results page is a section called "Web Pages Grouped By Topic." Underneath, all the pages found that match your query have been grouped into broad categories. You can click on a category link to narrow your focus, and you can further drill down, as desired.
Fans of Northern Light will see similarities between this and Northern Light's "Custom Search Folders," which also group results into categories, in real time. A key difference is presentation. Northern Light's folders, which have always been a useful alternative way to scan results, have always been tucked off to the side of its main results. Teoma's categories are front in center before the users, which will likely increase use.
To perform the categorization, Teoma looks at the results set, then seeks out "clusters" or "communities" of pages that link to each other. When these clusters emerge, the link text is analyzed to find the most common words, which are then used to describe the category. This use of link analysis is also different that the pure text analysis that Northern Light does, Teoma says.
How about Teoma's main results, the "Web Page" section -- what's there? These are the pages that are more likely to answer your questions, in contrast to the Expert Links pages that don't provide answers but may lead you to pages that do.
Teoma grew out of a federally funded project in 1998 at Rutgers University. The Teoma technology team is led by Professor Apostolos Gerasoulis, who now serves as Teoma's chief technology officer, and Professor Tao Yang, from University of California, Santa Barbara, who is chief scientist and vice president of research and development. Now a private company with funding from Hawk Holdings, Teoma hopes that it will establish some portal partnerships within the coming months. If not, then the Teoma site itself is likely to be expanded beyond the current demo.
The company is also considering enterprise and site search services in the future, as well as licensing its categorization tools to those who want to create their own directories or vertical portals.
Meta search tool that provides autocategorization similar to Northern Light's Custom Search Folders.
Counting Clicks and Looking at Links
The Search Engine Report, August 4, 1998
Discusses the emergence of clickthrough and link analysis as ways of refining search results. Focuses on the launch of Direct Hit, IBM's Clever and a former Ph.D project called Google.