Google Goes for Stop Words

Most search engines ignore a number of "stop words" -- common words that typically modify other words but carry no inherent meaning themselves, such as adverbs, conjunctions, prepositions, or forms of "be." You can usually force a search engine to include a stop word in your query by using the inclusion operator (the plus sign), but must use the operator for each stop word in your query.

Google now automatically includes stop words in quoted phrases. This means that the following two queries produce dramatically different results:

to be or not to be
"to be or not to be"

These examples are courtesy of search guru Greg Notess, who uses them to illustrate how stop words can be taken to an extreme. In the first example, the words "to" and "be" are treated as stop words by most search engines. The words "or" and "not" can be, but aren't always treated as Boolean operators. So it's possible that the entire query might be ignored! Alas, poor Hamlet.

In Google's case, "not" is neither a stop word nor a Boolean operator, so it's the only word processed in the first example query. With so little to work with, Google's results have nothing to do with Shakespeare (except, interestingly, for the "category" link which does point to the works of the Bard).

The second query, on the other hand, is treated as a phrase, and all words, including the stop words, are considered. Unfortunately, the results still aren't up to Google's usual standard, with only one result in the top thirty being relevant.

This probably isn't the most fair example to use, since most other search engines don't do a very good job with this particular query, either. But it illustrates an important point: the more stop words in your query, the less likely your results will include what you're looking for. Not the engine's fault, necessarily -- it's the nature of language itself.

Although including stop words in phrases can be helpful, some experienced searchers regret the new feature. Tara Calishain notes in a ResearchBuzz article that Google's including stop words in phrases kills her trick of using a stop word within a phrase as a wildcard. In other words, since they are ignored, stop words once worked as placeholders representing any word(s) in a phrase.

Greg Notess pointed out in a post to the Search Engine Showdown Discussion List that the stop word trick is still possible, since Google continues to treat "the" as a non-searchable stop word. Just use "the" in place of the unknown word in a phrase.

But Greg's "the" trick will work only for a short time. According to Google spokesperson Nate Tyler, "Google has implemented the first phase of support for stop words and will continue that implementation to include 'the' in the near future."

The bottom line: Including stop words in quoted queries will likely return good results if you're looking for relatively well-known quotations or titles. Otherwise, you're probably better off constructing your queries as you always have, and let Google or any other search engine continue to ignore the stop words they've determined don't really help, and can possibly hinder, finding relevant results.

Google FAQ: Automatic Exclusion of Common Words

The Search Engine Showdown Discussion List

Search Engine Showdown email discussion list is an unmoderated email list for discussing Internet search engines from the user's perspective.

Google Hoses One of My Favorite Search Tricks

One of Tara Calishain's favorite Google tricks was using a stop word as a wildcard in a phrase.

Google Wants Your Opinion

Google has introduced a new feature that allows you to provide feedback on the results of any search. Going beyond the voting buttons now available on the Google toolbar, the bottom of every search result page includes the following line: "Unsatisfied with your results? Help us improve."

Clicking the "help us improve link" brings up a new browser window that displays your query words, with a short feedback form. You can use tick boxes to indicate why the results were unhelpful, with selections including "off topic," "offensive," "described poorly," and "too similar to one another."

If you were looking for a particular page that you felt should have been returned in the results, you can also enter its URL. There's also a text box that allows you to describe what you were hoping to find, or what question you wanted answered in your own words.

The next time you're unsatisfied with your Google results, click the feedback link at the bottom of the result page and let the folks at the Googleplex know why.

Search Headlines

NOTE: Article links often change. In case of a bad link, use the publication's search facility, which most have, and search for the headline.

What fate for Excite@Home subscribers?...
Business 2.0 Dec 3 2001 9:00AM GMT
At M.I.T., a Fake Web Site Pokes Fun at the Media Lab...
New York Times Dec 3 2001 8:12AM GMT
Microsoft Reorganizes Web Divisions in Battle Against AOL... Dec 1 2001 10:23AM GMT
DomainPeople Launches Customized Domain Registration System...
Web Host Industry Review Dec 1 2001 10:12AM GMT
Playboy Claims Domain Registered By The Anti-Porn Flynt... Dec 1 2001 7:28AM GMT
Terra Lycos Deploys Newsedge...
Content-Wire Dec 1 2001 7:10AM GMT
AT&T-Excite fight may leave 4.1M without Net...
USA Today Nov 30 2001 4:12PM GMT
Rivals circling over faltering Excite@Home... Nov 30 2001 12:13PM GMT
Kevin Spacey wants his (domain) name back... Nov 30 2001 7:16AM GMT
Study: Sites Trending Toward Aggressive Web Advertising...
Internet News Nov 30 2001 12:42AM GMT
KaZaA ordered to cease infringing copyright...
The Register Nov 29 2001 7:19PM GMT
ACLU Assails Internet Anti-Smut Law...
Washington Post Nov 29 2001 2:28PM GMT
powered by

About the author

Chris Sherman is a frequent contributor to several information industry journals. He's written several books, including The McGraw-Hill CD ROM Handbook and The Invisible Web: Uncovering Information Sources Search Engines Can't See, co-authored with Gary Price. Chris has written about search and search engines since 1994, when he developed online searching tutorials for several clients. From 1998 to 2001, he was's Web Search Guide.