Inside the Google Search Appliance
"What is a Google Search Appliance?", you ask.
It's a way for corporations and institutions to run their very own copy of the Google Search engine, even inside their firewall. While there are a bunch of search engines that can index and search a few hundred thousand documents, the Google high-end version can index millions of documents and respond to many simultaneous queries. Now that's power!
But Google isn't just selling software: the Search Appliance comes pre-installed on a special version of Linux in its own hardware: the low-end version is a 1u rackmountable box, while the high end is a full server rack with at least search engine servers, as well as power supplies, load balancing and failover features, like a mini Google server farm.
The Google Search Appliance provides many of the best features of the public search engine, including the robot spider, index structure, familiar interface, spell checker, PageRank weighting in results relevance sorting, and caching copies of documents, converted to HTML format.
The search engine browser admin has many options for specifying sites and subdirectories to crawl, passwords, file format recognition, extensive indexing reports, server-side XSLT formatting of search results and search log reporting. Overall, it's a very nice corporate search engine that should address many people's needs to locate information in their intranet.
It's a 1.0 product: what's missing? While Google has done a fine job with it's initial release, it has some weaknesses. Unlike other search engines, it's a physical box, so it's noisy and takes up space. There is no programmatic interface to the search engine, or direct access to file systems, content management systems or database repositories, so any access to that text must go through an inefficient and HTTP interface.
Nor is there integration with corporate security and authentication systems. Unlike the public Google search engine, the search results are not significantly better than the competition, because Intranets aren't as good at linking as the whole Web with its billions of hypertext links, so the PageRank algorithm doesn't have as much to chew on.
The Google Search Appliance is new, and it has some rough edges, but if you have a bunch of documents on your Intranet, and can get to them via HTTP, it's fast and easy to install and does a good job.
Google Search Appliance
http://www.google.com/appliance/
Search Tools Analysis: Google Search Appliance, Version 1
http://searchtools.com/analysis/google-appliance-v1.html
A longer, more detailed review of the Google Search Appliance by SearchTools maven, Avi Rappoport.
Avi Rappoport, Principal Consultant for Search Tools Consulting, is the leading authority on site, Intranet and topical portal search engines.
Search Headlines
NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

Newsletter signup
Article Archives by Avi Rappoport
Google's API: For Fun, Not Profit (Yet) - Oct 30, 2003
Anatomy of a Search Engine: Inside FAST - Oct 31, 2002
Anatomy of a Search Engine: Inside Google - Oct 30, 2002
Special Search Tools Roundup - Sep 4, 2002
More article archives












