How to Hack Your Own Search Engine

Got the itch to go head-to-head with Google, Yahoo and all of the other big search players on the web? A new book provides a detailed blueprint for using and customizing Lucene, open-source search engine software that’s freely available online.

Lucene in Action by Otis Gospodnetic and Erik Hatcher is a thorough introduction to the inner workings of what’s arguably the most popular open source search engine.

Lucene was created by Doug Cutting, who was a senior architect of the long-lost Excite search engine. Cutting made the Lucene code freely available as a Sourceforge project, and over time it has been implemented in all sorts of ways. Cutting’s own Nutch project that I’ve written about is based on Lucene.

Apache, the open source web server group, has adopted Lucene and maintains a current support site called the Apache Jakarta Project (this is where you’ll download the code if you want to actually play around with Lucene yourself).

Some familiar applications use Lucene. Looksmart’s Furl uses Lucene (see the SearchDay review of Furl here. Bob Dylan’s official web site search function is powered by Lucene. And even groups like the Finnish Military use Lucene for search.

Once you’ve downloaded the core Lucene engine, which is a single Java Archive file, you need to extend it for your own particular application. That’s what Lucene in Action is all about.

The book isn’t for the faint of heart—it’s loaded with code examples and emphasizes a hands-on approach to learning. The book starts with the basics, and includes chapters on indexing, adding search to your own applications, analysis, and advanced search techniques.

Part two focuses on applied examples of Lucene in action. Chapters in this section demonstrate how to get the search engine to handle different document formats and how to use the various tools and extensions that have been developed by the Lucene developer community.

There’s also an excellent chapter covering case studies, showing the wide range of applications possible with the software. And appendices provide detailed instructions for downloading and installing Lucene, including pointers on where to find additional resources.

Even if you don’t want to attempt to create your own search engine, Lucene in Action is an excellent, thorough introduction to the inner workings of search engines. And if you’re willing to slog through the highly technical examples, you’ll find a well-written guide to search engine mechanics that can help you become a better searcher.

Lucene in Action
by Otis Gospodnetic and Erik Hatcher
Manning, $44.95
ISBN: 1-932394-28-1

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.

AOL unveils mobile services and deals
InfoWorld Mar 14 2005 10:42PM GMT
Enhancements to Local Search
Yahoo Web Developers Blog Mar 14 2005 10:36PM GMT
Google Guys Number 55 on Forbes Rich List
San Francisco Chronicle Mar 14 2005 10:25PM GMT
State of The Blogosphere, March 2005, Part 1: Growth of Blogs
Sifry’s Alerts Mar 14 2005 10:05PM GMT
Click Fraud In the Spotlight
ClickZ Today Mar 14 2005 10:03PM GMT
The Power of the Written Word
High Rankings Mar 14 2005 10:02PM GMT
Search-Specific Agencies Fight for Survival
Mediaweek Mar 14 2005 9:59PM GMT
Can Papers End the Free Ride Online?
New York Times Mar 14 2005 9:56PM GMT
Travel firms take Google action
BBC Mar 14 2005 9:52PM GMT
Big Brands Need To Do SEO, Too
ClickZ Today Mar 14 2005 9:22PM GMT
Click Fraud: If Everyone Benefits, Who Will Stop It? Mar 14 2005 9:18PM GMT
Yahoo Now Searches Stop Words in Phrases
Seach Engine Showdown Mar 14 2005 9:15PM GMT
New Government Info Search Resource From Clusty
Search Engine Watch Mar 14 2005 9:14PM GMT
Microsoft Plans Service to Sell Ads Next to MSN Search Results, People Say
Bloomberg Mar 14 2005 8:57PM GMT
Virgin Mobile Trials Zi’s Qix Search Engine
Dow Jones via iWon Mar 14 2005 3:25PM GMT

Related reading

Simple Share Buttons