Seven Stupid Searching Mistakes, Part 1

In the lighthearted spirit of the popular books for "idiots" and "dummies," here's a look at seven common blunders that are virtually guaranteed to deliver useless, nonsensical, or completely worthless search results. These are taken from a presentation I've made at a number of conferences, including Search Engine Strategies.

Some of these gaffes might surprise you. But once you recognize them, it's easy to banish these little gremlins forever from your Web search tool kit.

* Sputtering on "Stop Words"

Some search engines simply ignore certain words. They are never used to find a matching document, despite what amounts to a direct command when you type them into a search form.

These are called "stop words" because the search engine doesn't "stop" when the words are found in the index (if they are even indexed at all). Why not? Because stop words are either too common to generate meaningful results, or are parts of speech like adverbs, conjunctions, prepositions, or forms of "be" that mean nothing unless they're part of a phrase with more "important" nouns and verbs.

If you use a stop word in a query you may get wildly irrelevant results. For example, the phrase "searching the web" contains two stop words: "the" and "web." Though it's not a particularly common word, web is used so frequently on the Internet that it's virtually worthless as a finding aid.

Stripping out the stop words, "searching the web" becomes "searching," which will naturally lead to results describing everything from criminal manhunts to quests for enlightenment—and if you're lucky, maybe even something about searching the web.

How can you identify stop words? Google tells you when it's ignoring a stop word, at the very top of a results page. You can force Google to include a stop word in a query by putting a plus sign in front of it. AlltheWeb takes a different approach -- it often automatically rewrites your query to include a stop word as part of a quoted phrase with other query terms. Check out the link below to the 300 most common words in English, many of which are stop words.

* Bungling with Boolean

Boolean operators, like "and," "or," and "not," can help narrow search results—when used properly. The problem is that Boolean operators, because of their apparent simplicity, appear to be easy to use. Maybe, and/or not really.

According to Ran Hock, author of The Extreme Searcher's Guide to web Search Engines, search engines implement Boolean features in different ways. For example, while some accept a simple "not," others require "and not" for the same effect. Additionally, some engines require that Boolean operators be capitalized, while others do not (or and do not?).

If you really want to use Boolean operators, learn how to use them. Two outstanding tutorials on using these features are listed below.

* Being Ever So Vulgar

Vulgar comes from the Latin vulgus, meaning common. Like some educated sophisticates, search engines have a problem with common words. It's not that they're being snotty or pretentious. It's that some words are so common that they appear in literally millions of documents, making them virtually useless as a finding aid.

Take weather, for example. There are thousands of sites providing weather information, from local forecasts to elaborate treatises on meteorology. Tighten your query by using focusing words to narrow the scope of your search. Rather than merely searching for "weather," construct a query like "Cicely Alaska annual snowfall," or something equally specific.

* Looking for a Rose, By Any Other Name

Be careful when a word has multiple meanings. Think of the word "bond" as an example. If you just the single word "bond" as a query, the search engine has to figure out if you're looking for information about financial bonds, chemical bonds, or even James Bond.

Make it easier for the engine to help you. Ask yourself the question before the search engine does for you, and phrase your query accordingly.

Search engines are also easily confused by heteronyms, words that are spelled identically but have different meanings when pronounced differently. For example, "lead," pronounced LEED, means to guide. Pronounced as LED, though, the word refers to the metal element. When you can, use concrete synonyms instead of heteronyms.

Continued tomorrow: Committing capital offenses; close, but no cigar; and the number-one most common searching mistake -- looking for hits in all the wrong places.

More on Stop Words:

300 Most Common Words in English
Many of the 300 Most Common Words in English shown in this list are treated as stop words.

More on Boolean:

Search Engine Math
Most search engines implement a simple form of Boolean logic that's relatively easy to master. This tutorial by Search Engine Watch's Danny Sullivan shows how to use this search engine math.

Boolean Searching on the Internet
An outstanding, detailed, and comprehensive overview of Boolean searching on the internet, from the University at Albany Libraries.

More on Heteronyms:

The Heteronym Home Page˜cellis/heteronym.html
Using synonyms in search queries can be an effective way of narrowing the focus of your search -- unless they're heteronyms. For more examples, see this page.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

Top internet stories
Study finds Web sites prying less...
CNN Mar 27 2002 3:52PM GMT
Online portals news
Web developers wary of AOL switch...
MSNBC Mar 27 2002 2:32PM GMT
Internet features
The internet. Volume One...
BBC Mar 27 2002 1:52PM GMT
Internet: international news
Bahrain blocks opposition websites...
BBC Mar 27 2002 10:37AM GMT
Online legal issues news
Timetable Set in Napster Lawsuit... Mar 27 2002 6:50AM GMT
Online content news
Internet content filters no substitute for supervision: researchers...
ABC Online Mar 26 2002 11:20PM GMT
Online search engines news
Google replaces Scientology page...
MSNBC Mar 26 2002 8:03PM GMT
Online marketing news
The ethics of blocking Internet ads...
Chicago Tribune Mar 26 2002 4:58PM GMT
Domain name news
Afilias buys out Tucows .Info service...
The Register Mar 26 2002 2:32PM GMT
Online marketing news beats spammers in court...
Interactive Week Mar 26 2002 12:53PM GMT
Online portals news
AOL dumps Netscape and CompuServe brands... Mar 26 2002 12:20PM GMT
Online legal issues news
Porn-Filter Trial Gets Raunchy...
Wired News Mar 26 2002 11:33AM GMT
Internet: international news
US librarians against internet law...
ONE News Mar 26 2002 8:05AM GMT
powered by