An Open Source Search Engine

Nutch could rewrite the rules of search development — especially with an impressive roster of Internet luminaries now lining up behind it.

Ask anyone in Silicon Valley what the hottest application on the Internet is today and you can bet their answer will be search. The dealmaking has been nothing short of torrid. Only a year ago there were at least half a dozen major players. Now there are just three: Yahoo, which last month bought search giant Overture in a $1.6 billion deal; Google, the undisputed king of search; and Microsoft, which is busy building a search platform of its own. They’re all fighting to dominate the huge and ballooning market, already worth $2 billion and expected to generate between $6 billion and $8 billion in revenues by 2007.

Search is a game of intellectual property, innovation, and market position. The three combatants all keep jealous watch over their patents (Yahoo, for one, has more than 60), engineering talent (hundreds of Ph.D. holders work at Google), and market advantages (Microsoft — need we say more?). Indeed, search is such a complicated and expensive undertaking that analysts have pegged the cost of market entry at well over $100 million.

All that could change this fall, when a new player strides onto the field.

Meet Nutch, the open-source search engine. Open-source applications are unusual in that the code upon which the software runs is not owned by a private, commercial company but rather bound by a simple license that allows anyone to use, modify, and even profit from it free of charge, as long as they pledge to contribute their own innovations back into the code base. Because of this, anyone will be able to access Nutch’s code and use it to their own ends, without paying licensing fees or hewing to a particular company’s set of rules.

Perhaps more important, Google takes a “trust us” approach to search; they say they don’t skew their PageRank formula to favor certain sites, but we have no way of knowing for sure. With Nutch, the indexing and page-ranking technologies are all open and visible; you can check them yourself if you have a problem with your page’s ranking. Just as Linux has taken on Windows, revolutionizing the rules of search-engine development and distribution, Nutch could pose a threat to Google and other search giants. Interestingly, early Nutch development was supported in part by Overture’s R&D division, and an Overture official sits on the Nutch board.

“Search is interesting again,” says Doug Cutting, a founder and core project manager at Nutch. Cutting, whose development chops were honed at Xerox (XRX) PARC, Excite and Apple (AAPL), is building Nutch (that’s his toddler’s all-purpose word for “meal”) with a small team of engineers based around the country. But Cutting says they hope that once Nutch is loosed on the world, tinkerers from Romania to China to Palo Alto will help build it into a robust platform, in the spirit of Linux or Apache (which has garnered more than 60 percent of the Web-server software market in just the last couple of years).

“Search is the first thing people use on the Web now, and there are fewer and fewer alternatives,” Cutting says. With Nutch, “researchers, university folks, and anyone else can have a test bed to make search better. There are a lot of smart people out there that Google can’t hire.”

Mitch Kapor, who helped found Lotus Development and the Electronic Frontier Foundation and is founder and president of the Open Source Applications Foundation, certainly agrees. He’s thrown his weight behind the project by joining Nutch’s nonprofit board, as has Tim O’Reilly, the CEO of O’Reilly & Associates. Brewster Kahle, the visionary behind the Internet Archive, has also lended his support. Nutch is moving its servers to Kahle’s high-bandwidth location this weekend, a crucial step toward readying the engine for its public debut.

“I love Google,” Kapor says, “but this will push search to places that are not immediately obvious. In terms of research and innovation, there is a clear need for an open platform for search.” Kapor and others imagine new kinds of applications springing from Nutch, ideas that commercially driven companies like Yahoo or Microsoft would never fund. “Search is close to a duopoly,” Kapor points out. “Historically we know there are risks when that happens. It’s too important an application to not be transparent.”

Cutting won’t commit to a specific launch date for the engine, but he said he expects it to go live at sometime early this fall. Due to the move to Kahle’s facility and insufficient hardware (Cutting is looking for additional sponsors), Nutch’s demo — based on an initial crawl of more than 100 million webpages — is not yet open to the public. But Cutting, who together with his development partners has built an impressive resume in the search field, is confident his latest creation will be a contender once it launches. “It’s fun to go toe-to-toe with market leaders,” he says. “It’s always a challenge to build a better mousetrap.”

John Battelle is a visiting professor at the UC Berkeley Graduate School of Journalism, where he directs the business reporting program. He was the founder of the Industry Standard and a co-founding editor of Wired. A version of this article originally appeared in Business 2.0; reprinted with permission.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

Verity Posts 7th-Consecutive Profitable Quarter
Yahoo Sep 11 2003 11:56AM GMT
UK unlikely to sue P2P file sharers after US backlash
New Media Age Sep 11 2003 10:59AM GMT
Reaching All the Search Engines
High Rankings Sep 11 2003 5:51AM GMT
Google to provide blogging for free
Washington Times Sep 11 2003 1:40AM GMT
PC World Launches Spam Watch
Technology Marketing Sep 10 2003 11:58PM GMT
Overture unveils local search tool
Netimperative Sep 10 2003 6:07PM GMT
Anacubis Unveils Blended Google, Amazon Search
Research Buzz Sep 10 2003 5:18PM GMT
RIAA sues 12-year old for music downloads
Web-User Sep 10 2003 2:33PM GMT
‘Why We Want to Make the Internet Chinese’
World Press Review Sep 10 2003 12:33PM GMT
comScore Media Metrix Launches Reach/Frequency Analysis System
Media Post Sep 10 2003 7:12AM GMT
China blocks spam servers
ZDNet Sep 10 2003 2:42AM GMT
Yahoo: Would you pay to open up IM?
ZDNet Sep 10 2003 1:21AM GMT
Web Hosting News: New Domain Name Resource Site Launched
Web Host Directory Sep 9 2003 4:16AM GMT
powered by

Related reading