Closer, Closer, Closer

A couple of weeks ago, we sang the praises of the seemingly simple but actually quite powerful Boolean AND operator. When AND is used in a query, only documents including all of the words in your query are considered as possibly relevant matches. This excludes millions of potentially irrelevant documents, which is great, but it's possible to be even more precise in your query.

AND simply requires that your words appear in the document. It doesn't matter whether the words are near one another or not -- if they both appear they pass the AND test, even if one word is in the first sentence and the other words are in completely unrelated sentences elsewhere on the page.

Most search engines do consider the proximity, or nearness of words to one another when calculating relevance, and give higher weighting to documents where your query words are located near each other. But if you want to take things into your own hands and really make the search engine pay attention to the proximity of your query words, you have two fairly powerful options.

The first is useful when your query is a natural language phrase, where the words must appear in the same order in a document as they do in your query. The standard way to signal that you want the search engine to perform a phrase match is to put your query in double quotes.

Many search engines automatically try to detect phrases even if you don't put your query terms in double quotes. To do this, they rely on dictionaries of common phrases. While this works well in many cases, the virtually infinite combinations of words that make up phrases makes it impossible to create a comprehensive phrase dictionary. So most search engines settle for a few hundred thousand or so of the most common phrases to include in their dictionaries.

Even if you think your query is a common phrase, put it between double quotes. Otherwise, search engines that perform an automatic AND and don't recognize your query as a phrase may simply look for both words with little regard to proximity.

It's particularly important to use double quotes when you're searching for a person's name. Otherwise, you may get results where first and last names are reversed (Mark Anthony or Anthony Mark), or where punctuation gets in the way of your terms -- for example, when one word ends a sentence and another begins it ("...acting was on the mark. Anthony Hopkins, star of the...").

The second option for boosting the importance of proximity is when you're not necessarily looking for a phrase, but the nearness of words is still important. Proximity in this case can be used to both ensure that words you want to appear near each other actually are close in result documents, and perhaps equally important, that other words that are often used with your query words are excluded.

For example, the word stock near the word company would likely bring back information about equities. In most cases, you'd like to exclude documents about stock that include nearby words or phrases like livestock, soup broths, flowers and so on. What you really want are documents with the word stock near the word company, say within a dozen words of one another.

The Boolean command that influences proximity, conveniently enough, is the NEAR operator. Think of NEAR as a stronger form of AND and a weaker form of NOT or AND NOT neatly combined into a single operator.

Unfortunately, the NEAR operator, as useful as it is, seems to be disappearing from the major search engines. While AOL Search, AltaVista, MSN Search and Lycos previously supported NEAR, AltaVista appears to be the only service that still does.

To use the NEAR operator with AltaVista, you must use its advanced search form. NEAR finds documents containing both specified words or phrases within 10 words of each other.

Skilled searchers pack lots of simple techniques like these into their arsenal of tools for finding information. Both phrase searching and the NEAR operator offer simple but effective ways to take more control over your search. They only work in rather restricted circumstances. But they can often mean the difference between finding what you're looking for and wishing your results were closer, closer, closer.

Google Adds Usenet Posting Capability

As we mentioned in our earlier story about Google's new Usenet features, Google has introduced the ability to post and reply to messages. Google Groups is the world's largest Usenet archive, and contains more than 650 million searchable Usenet messages.

At the bottom of all Usenet messages (1 month old or less), Google Groups users will now see the following text: "Post a follow-up to this message." By clicking here, users can instantly reply to a message by first logging in, and then entering the text of their reply inside the corresponding web page. The initial registration process is simple, and requires that users register a valid email address and password to protect their identity.

Google Groups users can also initiate new conversations. To start a new thread, click the link that reads: "Post new message to"

Users also have the option of preventing their messages from being archiving by Google Groups using the "X-No-archive: yes" feature. Additionally, Google offers a number of services to better inform users about how to post, posting etiquette, and answers to frequently asked questions.

Google Groups/Usenet Archives

Style Guide for Posting to Usenet

FAQ (including instructions on removing posts)

Fewer than 60 percent of the households in the United States have Internet access, but that hasn't stopped the CEO of Forrester Research from predicting the death of the World Wide Web and the dawn of a new, application-based Internet.

Some successful Web sites are nothing more than databases of links to other sites. Becoming a comprehensive resource for a particular subject can generate lots of traffic and maybe some advertising revenue.

Sooner or later, every e-business reaches the point at which some market research would prove really useful in helping to make those tough decisions.

