Robert Scoble Wants What We Had -- Better Query Refinement. So Do I!

Robert Scoble's just finished reading John Battelle's book The Search and ponders creating a new chapter to follow John's ending one on "The Perfect Search." In short, Robert wants better query refinement. Well, we've had in the past and maybe will get again in the future. Below, I'll walk you through how exactly what Robert wants came and, sadly, went away. Plus my own thoughts on The Perfect Search.

Robert writes:

Now, go to "hotel" and you'll see what I call an intermediary at the top ( You'll also see Hilton, Marriott, Best Western, among others.

But that's not what you wanted. Remember, you were going to New York. So, you realize your search isn't specific enough. So, you enter "new york hotel."

Now we're getting somewhere. Lots of hotels. But, the first one is a hotel in Las Vegas. That's what we call "noise." Google can't decide between hotels IN New York or hotels NAMED New York.

Ahh, now you are understanding the problem. Today's search engines don't understand the CONTEXT of your search.

Yep, that's the classic problem. Nor is it a new one that today's search engines are grappling with. They've understood it for over a decade.

This is what I quoted WebCrawler creator Brian Pinkerton saying when I wrote my How Search Engines Rank Web Pages page originally back in 1996:

As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a librarian and saying, 'travel.' They?re going to look at you with a blank face."

OK -- a librarian's not really going to stare at you with a vacant expression. Instead, they're going to ask you questions to better understand what you are looking for.

Unfortunately, search engines don't have the ability to ask a few questions to focus your search, as a librarian can. They also can't rely on judgment and past experience to rank web pages, in the way humans can.

Many search writers have quoted Brian on this, because he's explained it often and well. A search engine, unlike a librarian, can't interrogate you. It can't ask further questions to help you narrow in on what you are looking for.

That's why the regular trend of someone trotting out a super-magical "natural language" search engine is always laughable. The pitch generally goes something like, "This search engine is smart enough to know when you typed in a sentence about AIDS that you meant AIDS the disease rather than aids as in something that helps you."

That would be great if people entered sentences with a variety of terms helpful for analysis. They don't. They enter anywhere between one to three words, typically. There's no context to analyze.

Instead, what you really want is someone to ask you further questions and give you options to explore. As Robert writes:

So, what COULD search engines do? Well, first, give me some choices at the top of the page. Why couldn't search engines ask you these questions:

1) "are you looking for hotels in New York or named New York?"
2) Are you looking for hotels with free Wifi?
3) Are you looking for hotels with great views?
4) Are you looking for hotels nearby major tourist destinations?
5) Are you looking for hotels with above average ammenities like super large bathtubs, well stocked minibars, etc.?

Indeed, they could. Indeed, they have. That was the claim to fame for Ask Jeeves, when it launched in 1997. As I wrote then:

[Ask Jeeves] provides matching web pages, but results are usually prefaced by questions aimed at helping users find the information they want. For example, a search for "Bill Clinton" brings back a results pages topped by these questions:

+ Where can I find information about US President Bill Clinton
+ Who ran for U.S. President in 1996?

That's what Robert wants. And it worked. Ask Jeeves had great relevancy that helped it gain marketshare that it continues to hold on to today because of this initial "question answering." But it wasn't natural language search technology that made it happen, as people often mistakenly assume about Ask Jeeves. It was a bunch of human editors who watched the queries that came in, then made questions to help refine your search further, then linked you to preselected pages that seemed to have the right answer.

My Ask Jeeves: Asking Questions To Give You Answers article from 1998 looks even further at this:

Enter Ask Jeeves. The service does an impressive job of getting people to what they want by asking questions.

For example, imagine you want information about cars. Enter "cars" into the Ask Jeeves search box, and the service comes back with questions like:

+ Where can I find product reviews for cars?
+ Which models of cars are most frequently stolen?
+ Where can I locate information on the history of automobiles?

In front of each question is a Go icon. Choosing it takes users to a web site that answers the question.

The secret to the accuracy of Ask Jeeves is human intervention. About 30 people work full time creating the knowledge base of questions, which currently numbers about 7 million. They come up with ideas on their own, especially for popular topics, and they also watch what people are actually searching for.

So what happened? Why don't we STILL have the refinement Robert wants? Because humans are expensive, and analyzing links was cheaper. Google popularized that, and all the search engines went the crawler/algorithm/automation route. Ask dropped its editors.

So did MSN, by the way. MSN used to have editors that did this type Ask Jeeves-style refinement. As I wrote back in 2000:

New "Popular Search Topics" links that now appear below the search box, after you perform a search. These are suggestions designed to help you easily narrow your request to a particular topic, if your original search was ambiguous. For example, in a search for "saturn," you'll see these options:

+ Saturn Corporation (auto manufacturer)
+ Saturn (planet)
+ Sega Saturn cheats (game hints)

Select a topic, and the search engine will rerun your request focused around that particular topic. However, the real beauty to these is that you're not simply giving the search engine new words to search for, such as "planet saturn," if you were to choose the planet-oriented topic. While those words will appear in the search box, behind the scenes they are mapped to other words that editors at MSN Search believe will bring up the best sites for that topic. Moreover, the editors may have preselected what they believe to be the best sites for that particular query.

That's just one example of the hard work going on at MSN Search to improve the quality of their results. A team of editors closely monitors search logs and provides human intervention where needed to improve the listings. Misspellings are a good example. Consider:

That editorial oversight I always felt was a key reason why MSN did, at one time, have very good relevancy. But in the quest to embrace links and crawling (and cost savings with that), it went away.

How about Yahoo? Yahoo's human compiled directory structure helped make the service popular in my book because the directory lead to query refinement in a way the crawlers couldn't match. You can still get to that, as I explained in my recent Google Ranking Itself Tops For Britney Spears & The Need For Better Categorization post, but you've gotta work hard to get there:

The reality is every search on any search engine will have some irrelevant results. Ideally, what you'd want for a popular and broad query on Britney is to get a better classification of types of results you can see: official sites, fan sites, sites about her film career, Britney as a part of popular society and so on. Since everything has some relevancy, such groupings help ensure you get into a particular area related to Britney that you're interested in.

For example, consider if you searched on Yahoo Directory, where you could see all directory categories like this:

  • Rock and Pop > Britney Spears
  • Rock and Pop > Anti-Britney Spears
  • Britney Spears Concert Tickets
  • Britney Spears > Lyrics
  • See how the "topical relevancy" of all things Britney is divided into four major areas? How about the 208 topics that Clusty finds, which include:

    Sadly, the demise of human-powered directories on major search engines has all but killed such categorization from really being show to searchers.

    I've seen various prototypes of query refinement tools from smaller players over the years, and query refinement at the major players isn't dead, of course. Ask Jeeves just improved its Zoom tool, for example. In addition, the continued growth of vertical search helps. It's easier to give you lots and lots of options of hotels to choose from when you're in a travel vertical search engine. That's because unlike in regular search, you're probably in a better frame of mind to make use of relevant drop-down boxes and checkboxes that would be ignored in web search.

    Overall, I share Robert's frustrated times 100! One of the reasons I've found tagging a waste of time, as I ranted in my Another Poke At Tags As Search Savior piece, is because search engines have had tools to help us better refine our results and cluster pages into topical areas. It's just been ignored -- ignored to the point that we're making use of tools like tagging to make up for what we ought to get from the search engines directly.

    John's seen Robert's post and gives a few comments here. Tim Bray touches on Robert wanting the Semantic Web here. General comments from those reading Robert's post can be found on his blog here.

    FYI, John asked me for thoughts on The Perfect Search that didn't make the cut for the book. But if you're curious, here's what I emailed him back last September. I'll ask him if I can add in the email I was responding to, that puts my response in better context. If so, you'll see it added here later. My response:

    I can't imagine such a world. It makes a nice pitch for the search companies, but knowledge is a messy thing.

    If we're talking about indisputable facts, it's a bit easier. Thomas Jefferson was the third president of the United States. I know of no one who questions that.

    Who was the first person on the moon. Neil Armstrong -- unless you are of the contingent that believes moon landings never happened. OK, I think those folks are crackpots. But the perfect search that comes back with Armstrong isn't the perfect search for them.

    What's the answer to gay marriage? Who killed Kennedy? Was Bush right or wrong for going into Iraq? Is the MMR vaccine safe for children?

    None of these can be answered definitively. They're more than just questions with nuances. They're questions that have answers ultimately determined the by reader themselves, answers that may be different for each person, based on what they choose to believe after reviewing many opinions.

    I can envision a system that tries to collect for you a variety of references on topics. Maybe it even assembles them into an encyclopdia-like, wiki-like page. The assemby of this knowledge might be considered "answers" by some. To me, it still represents the start of a knowledge quest. It's akin to exactly how search works now -- a list of references, with the searcher still needing to explore.

    I'm sure we'll see search advance on simply pointing people to the easy stuff, the facts that can be produced, direct navigation to web sites and so on. I'm also sure we'll see search improve to better understand what we're interested in, based on past habit and visits. But all knowledge will never be accessible, unless they figure out a way to digitize the minds of everyone living and dead. Even when dealing with what knowledge we do have chronicled, distilling a perfect answer is impossible. God could provide a perfect search as you outline. Search engines aren't God today, and they'll never be.

    Having said this, I was agast last year when some Wi-Fi exec likened Google to God in Friedman's column. While we may not have the perfect search, nor will we, some people may believe search engines (and the web by extension) already offer it.

    We've had articles about judges searching the web themselves to see if they can dig up evidence. Fox News lamely tries to defend calling the BBC anti-American by citing search counts. Students apparently are abandoning traditional research methods and assuming the magic little search box brings up the right answer. I've watched people spend tons of time searching for a company's phone number rather than just calling information. Two television shows I watched this week had characters talking about how they "Googled" something, with the assumption that whatever they retrieved must be correct. Some people already believe a perfect search tool exists, and the way it is shaping them is that they're relying on it too exclusively.

    So the threat is this. In a world where people believe a perfect search exists, that world may fail to seek out knowledge in other ways. Someone blogs something that's factually incorrect. Search picks this up. There are no other references out there. Search is perfect, ergo, what's wrong becomes right. No one bothers to actually follow up on the fact.

    I was fortunate enough in college to hear Loren Needles from Analytica talk about the need to fully question any facts. At the time, he talked about how a recent hurricane had been blamed for a dropoff in some economic indicators. In short order, he quickly demonstrated how there was no way the hurricane could have cause a dropoff of such extent. Despite this, newspapers across the country accepted the explanation as fact.

    That's what a perfect search potentially does for us, makes us less questioning because we think the answers are all in that little box. They aren't, nor will they ever be.