Got the itch to go head-to-head with Google, Yahoo and all of the other big search players on the web? A new book provides a detailed blueprint for using and customizing Lucene, open-source search engine software that’s freely available online.
Lucene in Action by Otis Gospodnetic and Erik Hatcher is a thorough introduction to the inner workings of what’s arguably the most popular open source search engine.
Lucene was created by Doug Cutting, who was a senior architect of the long-lost Excite search engine. Cutting made the Lucene code freely available as a Sourceforge project, and over time it has been implemented in all sorts of ways. Cutting’s own Nutch project that I’ve written about is based on Lucene.
Apache, the open source web server group, has adopted Lucene and maintains a current support site called the Apache Jakarta Project (this is where you’ll download the code if you want to actually play around with Lucene yourself).
Some familiar applications use Lucene. Looksmart’s Furl uses Lucene (see the SearchDay review of Furl here. Bob Dylan’s official web site search function is powered by Lucene. And even groups like the Finnish Military use Lucene for search.
Once you’ve downloaded the core Lucene engine, which is a single Java Archive file, you need to extend it for your own particular application. That’s what Lucene in Action is all about.
The book isn’t for the faint of heart—it’s loaded with code examples and emphasizes a hands-on approach to learning. The book starts with the basics, and includes chapters on indexing, adding search to your own applications, analysis, and advanced search techniques.
Part two focuses on applied examples of Lucene in action. Chapters in this section demonstrate how to get the search engine to handle different document formats and how to use the various tools and extensions that have been developed by the Lucene developer community.
There’s also an excellent chapter covering case studies, showing the wide range of applications possible with the software. And appendices provide detailed instructions for downloading and installing Lucene, including pointers on where to find additional resources.
Even if you don’t want to attempt to create your own search engine, Lucene in Action is an excellent, thorough introduction to the inner workings of search engines. And if you’re willing to slog through the highly technical examples, you’ll find a well-written guide to search engine mechanics that can help you become a better searcher.
Lucene in Action
by Otis Gospodnetic and Erik Hatcher
NOTE: Article links often change. In case of a bad link, use the publication’s search facility, which most have, and search for the headline.