Inktomi has rolled out a new paid inclusion program aimed at large content providers. In addition, the company has also unveiled new changes to how the service indexes and ranks web pages.
"Index Connect" is a program that offers cost per click pricing to those wishing to list 1,000 pages or more with Inktomi. In contrast, Inktomi's existing "Search/Submit" program, introduced in November, charges a per page fee. The new program is designed to be more economical for big publishers with lots of content.
"It's aimed at much larger sites than what we were doing with Search/Submit," said Troy Toman, Inktomi's general manager of search. "For them, per page pricing isn't good."
Among the initial partners are companies and sites such as eBay, Epinions, IDG and RollingStone.com. They can now ensure that selected content from their web sites is included in the Inktomi index and refreshed according to schedules that they determine. Without Index Connect, they would instead depend on Inktomi doing a generally random selection of documents from their sites and typically checking for changes only once per month.
Inktomi is also extending the program for free to charitable, educational and other not-for-profit organizations, allowing them greater control over their content. Examples of these included in the initial launch of Index Connect were KQED public broadcasting in San Francisco and the Hunger Project. Inktomi says that not-for-profits interested in participating in Index Connect should use the standard request form and indicated that they are non-profit. Arrangements will then be made for indexing.
The new program is another method for Inktomi to earn revenues from search, as well as being able to share those revenues with portal and search sites that make use of its services. However, the system does also let Inktomi better understand what content from various web sites that it should be indexing.
"By having a relationship directly with those sites, we can make sure we are doing a better job," Toman said. "It helps us be more intelligent about what we are crawling."
In particular, despite making use of link analysis to locate what it feels are the best pages on the web, Inktomi says it still needs site owner input about what content they think is important and how often it should be revisited.
"Having our WebMap and understanding the link structure [of the web” has gone a long way, but what we've realized is that we are pulling all we can out of the WebMap," said Andrew Littlefield, Inktomi's chief strategist of search.
For instance, at KQED, there are some web pages that are inaccessible to standard crawlers. By working with the organization, Inktomi now has access to URLs that let its crawler index important content from the site, the company says.
What's missing from the paid inclusion programs Inktomi offers is a way for what I call "hobbyists" to participate. These are people who produce great content out of personal interest or as a hobby. They aren't non-profits, but their sites generally are not run to make much money. While Inktomi's self-serve Search/Submit program is designed for smaller site owners, it may still be too pricey for them. So, how does Inktomi interact with these people?
One idea might be a low cost registration fee: US $10, $25 or $50 per year, for example. That might let a site owner get their home page included on a guaranteed basis, plus allow them to indicate to Inktomi which other pages they think are a priority to include. There would be no guarantee that these additional pages would be included, but the extra information might be used to more intelligently guide the Inktomi crawler, rather than let it randomly wander through the site. In a way, such a program would be a reverse robots.txt file, with the emphasis on what should be included, rather than excluded.
Inktomi's says it is aware of the hobbyist gap, and the company hopes to come up with some type of offering for this group, in the near future.
In addition to the new paid inclusion program, Inktomi also has rolled out changes to its search engine that it hopes will improve the relevancy of its results.
Chief among these is human modeling. Inktomi has been using an internal editorial staff to run massive numbers of searches and then select documents that they consider relevant. The company has then been tweaking its various relevancy controls to try and automatically match the human selections. In this way, the company hopes to model the human qualities of what's relevant into its ranking software.
"We didn't make huge changes in the algorithm," said Paul Karr, Inktomi's director of web search. "Essentially, what we are able to do is take the modeling technology and apply it to the database. We can fine tune it, experiment, and try to look at what's best."
In addition to the human modeling, Inktomi says it has improved its use of link analysis and is now also doing automatic proximity searching. For example, if you were to search for "george bush," it would favor pages with those words appearing on them in that order.
"Most search engines do not do some type of proximity, because it is very expensive computationally, but you do tend to get this sort of organic proximity without specifying it due to words in the title or the anchor text," said Doug Cook, Inktomi's director of web search engineering. "It's sort of a poor man proximity search."
In other words, while it may have seemed like major search engines have always done automatic proximity searches, that's been more accidental than intentional, according to Inktomi. For instance, a search for "george bush" would bring back pages containing that phrase at the top of the results because both words were in the important ranking area of the title tag, rather than because they were in the order you specified. That's no longer the case with Inktomi. Proximity searching is now always being done.
"This is the first time we've explicitly done proximity on all the queries. Essentially, we had to rewrite all of our core algorithms. Initially, there was a performance cost, but the team came up with some really clever new algorithms that were actually faster than our existing ones," Cook said.
Inktomi has also introduced index blending into its search results, which means that you may get information from the LookSmart directory, the paid inclusion index or the non-paid content that comes from crawling the web blended seamlessly in the same set of results. This also means that Inktomi could include news content, shopping content or other specialty search results into the results set.
Behind the scenes, query modeling is used to decide which databases to hit and when. For example, Inktomi has a collection of 500 million documents, but 110 million of these are kept in a "Best of the Web" index. A search for a popular subject, such as "Britney Spears," may be sent only to this smaller collection while a search for a more obscure topic might go against the entire index.
Unfortunately, there's no way to force Inktomi to search against a particular set of documents, nor can you even tell which partners may even prevent searches from digging as deep as possible. Inktomi does say that most of its partners do hit its entire 500 million page index, and its index of paid inclusion content is always queried by its US and European partners.
Inktomi Index Connect
More details about the pay-per-click inclusion program.
Inktomi Search/Submit Partners
More details about Inktomi's self-serve paid inclusion program.
Pay For Placement?
Past articles from Search Engine Watch about Inktomi's paid inclusion programs and other paid inclusion programs can be found here.
How Inktomi Works
A fully-updated guide to getting listed with Inktomi will be posted by Wednesday of this week, at the latest.
Inktomi Gets Relevant
PC World, March 14, 2001
Another look at Inktomi's relevancy and indexing changes.
Introducing SES Online
Want to view one of the sessions you missed or listen to an especially informative presenter a second time? SES New York sessions are available for purchase on ClickZ Academy's new e-Learning site. SES is now Online!