In Praise of Fuzzy Searching

According to the Oxford American Dictionary, "fuzzy" means blurred or indistinct. Now, the last thing we searchers want are blurred or indistinct results, yet paradoxically, a liberal dose of "fuzzy logic" can actually improve the precision of search engine results.

Using fuzzy logic, a search engine can relax the boundaries between word meanings to a certain degree. Just as a camera lens set to a larger aperture brings a greater range of view into focus, fuzzy logic in a search engine will expand the "depth of field" of potential search results.

This is crucial when you're searching for things that fall within a range of possibilities. The absolute rules of traditional logic exclude things outside of a range, even those within .0001% of being included. Fuzzy logic makes these harsh constraints more forgiving, allowing you to potentially find things you might not otherwise find.

How do you use fuzzy logic in a search engine? You don't -- you simply enter your query terms and let the engine handle the process for you.

This notion will rankle those who believe that "advanced" searching means weaving unambiguous Boolean operators into a query and snugly wrapping the whole thing up in a conjunction of parentheses. After all, Boolean operators allow us to take control of our search, insisting on the presence of some words and absolutely rejecting others from making an appearance in our search results.

This is true. Boolean logic is a power tool. In the hands of a skilled searcher, Boolean queries can often yield marvelous results. But power tools in the hands of those untrained in their ways can lead to disastrous results.

Start with the fact that those simple words AND, OR and NOT don't mean exactly the same thing to each search engine. Where Excite excludes words with the NOT operator (in upper case only) AltaVista requires AND NOT. Where Google allows an uppercase OR operator, FAST does not support OR at all.

Then there's the little problem of operator precedence. NOT is the "strongest" operator, always evaluated before AND, just as AND takes precedence over OR. You can short circuit precedence by enclosing the part of the query you want evaluated first in parenthesis. You can makes part of a query even more powerful by "nesting" parenthesis (including the (really important part) in parenthesis within parenthesis).

But being successful with this logical conjuring requires a fair amount of practice.

Boolean operators are great if you're searching for something that's unambiguous and easily differentiated from other things. But given the ambiguities inherent in language, it's not often that we have this luxury. This is why search engines don't rely on Boolean logic alone to calculate relevance. In fact, they apply a whole array of techniques to try to determine the best results.

Some of the methods (or algorithms) search engines use to determine relevance include:

  • Vector space

  • Term relationship
  • Proximity
  • Probabilistic algorithms
  • User popularity
  • Link popularity
  • Link analysis
  • Domain information, such as ownership or server location

And, of course, fuzzy logic. When you enter a query, each search engine processes it with its own "secret sauce" made up of these and many other approaches.

But here's the problem: when you use Boolean operators you override the search engine's own secret sauce, in essence telling it that you're smarter than it is. To gain the sense of control that Boolean seems to provide you're actually sacrificing quite a lot of the power built into the search engine.

Sometimes, that works. Quite often you're better off letting the search engine do its own thing. How do you know when?

Ah, but that's a fuzzy question... To paraphrase the famous jingo, just try it.

Search Engine Features For Searchers
This search engine features chart is designed primarily for users of search engines. It summarizes key search commands and search assistance features.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.