Robert Scoble Wants What We Had — Better Query Refinement. So Do I!

Robert Scoble’s just finished reading John Battelle’s book The Search and
ponders creating a new chapter to follow John’s ending one on "The Perfect
Search." In short, Robert wants better query refinement. Well, we’ve had in the
past and maybe will get again in the future. Below, I’ll walk you through how
exactly what Robert wants came and, sadly, went away. Plus my own thoughts on
The Perfect Search.

Robert
writes
:

Now, go to "hotel" and you’ll see what I call an intermediary at the top (hotels.com).
You’ll also see Hilton, Marriott, Best Western, among others.

But that’s not what you wanted. Remember, you were going to New York. So,
you realize your search isn’t specific enough. So, you enter "new york hotel."

Now we’re getting somewhere. Lots of hotels. But, the first one is a hotel
in Las Vegas. That’s what we call "noise." Google can’t decide between hotels
IN New York or hotels NAMED New York.

Ahh, now you are understanding the problem. Today’s search engines don’t
understand the CONTEXT of your search.

Yep, that’s the classic problem. Nor is it a new one that today’s search
engines are grappling with. They’ve understood it for over a decade.

This is what I quoted WebCrawler creator Brian Pinkerton saying when I wrote
my How
Search Engines Rank Web Pages page
originally back in 1996:

As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a
librarian and saying, ‘travel.’ They?re going to look at you with a blank
face."

OK — a librarian’s not really going to stare at you with a vacant
expression. Instead, they’re going to ask you questions to better understand
what you are looking for.

Unfortunately, search engines don’t have the ability to ask a few questions
to focus your search, as a librarian can. They also can’t rely on judgment and
past experience to rank web pages, in the way humans can.

Many search writers have quoted Brian on this, because he’s explained it
often and well. A search engine, unlike a librarian, can’t interrogate you. It
can’t ask further questions to help you narrow in on what you are looking for.

That’s why the regular trend of someone trotting out a super-magical "natural
language" search engine is always laughable. The pitch generally goes something
like, "This search engine is smart enough to know when you typed in a sentence
about AIDS that you meant AIDS the disease rather than aids as in something that
helps you."

That would be great if people entered sentences with a variety of terms
helpful for analysis. They don’t. They enter anywhere between one to three
words, typically. There’s no context to analyze.

Instead, what you really want is someone to ask you further questions and
give you options to explore. As Robert writes:

So, what COULD search engines do? Well, first, give me some choices at the
top of the page. Why couldn’t search engines ask you these questions:

1) "are you looking for hotels in New York or named New York?"
2) Are you looking for hotels with free Wifi?
3) Are you looking for hotels with great views?
4) Are you looking for hotels nearby major tourist destinations?
5) Are you looking for hotels with above average ammenities like super large
bathtubs, well stocked minibars, etc.?

Indeed, they could. Indeed, they have. That was the claim to fame for Ask
Jeeves, when it launched in 1997. As I
wrote then:

[Ask Jeeves] provides matching
web pages, but results are usually prefaced by questions aimed at helping
users find the information they want. For example, a search for "Bill Clinton"
brings back a results pages topped by these questions:

+ Where can I find information
about US President Bill Clinton
+ Who ran for U.S. President in 1996?

That’s what Robert wants. And it worked. Ask Jeeves had great relevancy that
helped it gain marketshare that it continues to hold on to today because of this
initial "question answering." But it wasn’t natural language search technology
that made it happen, as people often mistakenly assume about Ask Jeeves. It was
a bunch of human editors who watched the queries that came in, then made
questions to help refine your search further, then linked you to preselected
pages that seemed to have the right answer.

My Ask
Jeeves: Asking Questions To Give You Answers
article from 1998 looks even
further at this:

Enter Ask Jeeves. The service does an impressive job of getting people to
what they want by asking questions.

For example, imagine you want information about cars. Enter "cars" into the
Ask Jeeves search box, and the service comes back with questions like:

+ Where can I find product reviews for cars?
+ Which models of cars are most frequently stolen?
+ Where can I locate information on the history of automobiles?

In front of each question is a Go icon. Choosing it takes users to a web
site that answers the question.

The secret to the accuracy of Ask Jeeves is human intervention. About 30
people work full time creating the knowledge base of questions, which
currently numbers about 7 million. They come up with ideas on their own,
especially for popular topics, and they also watch what people are actually
searching for.

So what happened? Why don’t we STILL have the refinement Robert wants?
Because humans are expensive, and analyzing links was cheaper. Google
popularized that, and all the search engines went the
crawler/algorithm/automation route. Ask dropped its editors.

So did MSN, by the way. MSN used to have editors that did this type Ask
Jeeves-style refinement. As I wrote back in 2000:

New "Popular Search Topics" links that now appear below the search box,
after you perform a search. These are suggestions designed to help you easily
narrow your request to a particular topic, if your original search was
ambiguous. For example, in a search for "saturn," you’ll see these options:

+ Saturn Corporation (auto manufacturer)
+ Saturn (planet)
+ Sega Saturn cheats (game hints)

Select a topic, and the search engine will rerun your request focused
around that particular topic. However, the real beauty to these is that you’re
not simply giving the search engine new words to search for, such as "planet
saturn," if you were to choose the planet-oriented topic. While those words
will appear in the search box, behind the scenes they are mapped to other
words that editors at MSN Search believe will bring up the best sites for that
topic. Moreover, the editors may have preselected what they believe to be the
best sites for that particular query.

That’s just one example of the hard work going on at MSN Search to improve
the quality of their results. A team of editors closely monitors search logs
and provides human intervention where needed to improve the listings.
Misspellings are a good example. Consider:

That editorial oversight I always felt was a key reason why MSN did, at one
time, have very good relevancy. But in the quest to embrace links and crawling
(and cost savings with that), it went away.

How about Yahoo? Yahoo’s human compiled directory structure helped make the
service popular in my book because the directory lead to query refinement in a
way the crawlers couldn’t match. You can still get to that, as I explained in my
recent Google
Ranking Itself Tops For Britney Spears & The Need For Better Categorization

post, but you’ve gotta work hard to get there:

The reality is every search on any search engine will have some irrelevant
results. Ideally, what you’d want for a popular and broad query on Britney is
to get a better classification of types of results you can see: official
sites, fan sites, sites about her film career, Britney as a part of popular
society and so on. Since everything has some relevancy, such groupings help
ensure you get into a particular area related to Britney that you’re
interested in.

For example, consider if you

searched
on Yahoo Directory, where you could see all directory categories
like

this
:


  • Rock and Pop > Britney Spears

  • Rock and Pop > Anti-Britney Spears

  • Britney Spears Concert Tickets

  • Britney Spears > Lyrics
  • See how the "topical relevancy" of all things Britney is divided into four
    major areas? How about the 208 topics that Clusty

    finds
    , which include:

    Sadly, the
    demise of human-powered directories
    on major search engines has all but
    killed such categorization from really being show to searchers.

    I’ve seen various prototypes of query refinement tools from smaller players
    over the years, and query refinement at the major players isn’t dead, of course.
    Ask Jeeves just improved its
    Zoom
    tool, for example. In addition, the continued growth of vertical search helps.
    It’s easier to give you lots and lots of options of hotels to choose from when
    you’re in a travel vertical search engine. That’s because unlike in regular
    search, you’re probably in a better frame of mind to make use of relevant
    drop-down boxes and checkboxes that would be ignored in web search.

    Overall, I share Robert’s frustrated times 100! One of the reasons I’ve found
    tagging a waste of time, as I ranted in my
    Another Poke At
    Tags As Search Savior
    piece, is because search engines have had tools to
    help us better refine our results and cluster pages into topical areas. It’s
    just been ignored — ignored to the point that we’re making use of tools like
    tagging to make up for what we ought to get from the search engines directly.

    John’s seen Robert’s post and gives a few comments
    here.
    Tim Bray touches on Robert wanting the Semantic Web

    here
    . General comments from those reading Robert’s post can be found on his
    blog

    here
    .

    FYI, John asked me for thoughts on The Perfect Search that didn’t make the
    cut for the book. But if you’re curious, here’s what I emailed him back last
    September. I’ll ask him if I can add in the email I was responding to, that puts
    my response in better context. If so, you’ll see it added here later. My
    response:

    I can’t imagine such a world. It makes a nice pitch for the search
    companies, but knowledge is a messy thing.

    If we’re talking about indisputable facts, it’s a bit easier. Thomas
    Jefferson was the third president of the United States. I know of no one who
    questions that.

    Who was the first person on the moon. Neil Armstrong — unless you are of
    the contingent that believes moon landings never happened. OK, I think those
    folks are crackpots. But the perfect search that comes back with Armstrong
    isn’t the perfect search for them.

    What’s the answer to gay marriage? Who killed Kennedy? Was Bush right or
    wrong for going into Iraq? Is the MMR vaccine safe for children?

    None of these can be answered definitively. They’re more than just
    questions with nuances. They’re questions that have answers ultimately
    determined the by reader themselves, answers that may be different for each
    person, based on what they choose to believe after reviewing many opinions.

    I can envision a system that tries to collect for you a variety of
    references on topics. Maybe it even assembles them into an encyclopdia-like,
    wiki-like page. The assemby of this knowledge might be considered "answers" by
    some. To me, it still represents the start of a knowledge quest. It’s akin to
    exactly how search works now — a list of references, with the searcher still
    needing to explore.

    I’m sure we’ll see search advance on simply pointing people to the easy
    stuff, the facts that can be produced, direct navigation to web sites and so
    on. I’m also sure we’ll see search improve to better understand what we’re
    interested in, based on past habit and visits. But all knowledge will never be
    accessible, unless they figure out a way to digitize the minds of everyone
    living and dead. Even when dealing with what knowledge we do have chronicled,
    distilling a perfect answer is impossible. God could provide a perfect search
    as you outline. Search engines aren’t God today, and they’ll never be.

    Having said this, I was agast last year when some Wi-Fi exec likened Google
    to God in Friedman’s
    column
    . While we may not have the perfect search, nor will we, some people
    may believe search engines (and the web by extension) already offer it.

    We’ve had articles about judges searching the web themselves to see if they
    can dig up evidence. Fox News lamely
    tries
    to defend calling the BBC anti-American by citing search counts. Students
    apparently are abandoning traditional research methods and assuming the magic
    little search box brings up the right answer. I’ve watched people spend tons
    of time searching for a company’s phone number rather than just calling
    information. Two television shows I watched this week had characters talking
    about how they "Googled" something, with the assumption that whatever they
    retrieved must be correct. Some people already believe a perfect search tool
    exists, and the way it is shaping them is that they’re relying on it too
    exclusively.

    So the threat is this. In a world where people believe a perfect search
    exists, that world may fail to seek out knowledge in other ways. Someone blogs
    something that’s factually incorrect. Search picks this up. There are no other
    references out there. Search is perfect, ergo, what’s wrong becomes right. No
    one bothers to actually follow up on the fact.

    I was fortunate enough in college to hear Loren Needles from Analytica talk
    about the need to fully question any facts. At the time, he talked about how a
    recent hurricane had been blamed for a dropoff in some economic indicators. In
    short order, he quickly demonstrated how there was no way the hurricane could
    have cause a dropoff of such extent. Despite this, newspapers across the
    country accepted the explanation as fact.

    That’s what a perfect search potentially does for us, makes us less
    questioning because we think the answers are all in that little box. They
    aren’t, nor will they ever be.