MSN Search Gets Neural Net/RankNet Technology & (Potentially) Awesome New Search Commands

Local, Relevance, and Japan! from the MSN Search WebLog talks about MSN Search using a new relevancy ranking system based on "Neural Net" technology, along with new search commands -- such as anchor text searching -- now available.

The cynical part of me is expecting that soon we'll be hearing about "MSN Search With New Neural Net Technology" coming in the marketing. Google never did a "Google With PageRank Technology" push, but it did often offer up PageRank as something that made it special. Meanwhile, ads for Ask Jeeves over here in the UK keep going on about "Ask Jeeves With New Teoma Technology," which makes me laugh, given that Ask has owned Teoma's technology for several years. When did it become new again?

Anyway, the post has a really cool picture illustrating how the "correct" page for a search on pbs evolution videos wasn't ranking well in early May but then through the help of the new technology moved up to the top position by June. Of course, that other neural net technology -- the human brain inside of an editor -- could have made the change in a few minutes.

OK, in fairness, you do want automated systems to learn how to do this stuff better. You can't have human editors constantly meddling in search results. But the occasional intervention would be nice.

Neural Net & RankNet

So what about with Neural Net technology? Kudos to Greg Linden. In his MSN Search and Learning to Rank post, he dug up a paper about the topic from Microsoft Research: Learning to Rank using Gradient Descent (PDF format).

The paper talks about RankNet, a much better sounding name for the technology. I'd love to give you a one sentence summary of what it does, but so far, that escapes me despite reading the paper several times. There's sure to be discussion and analysis, which I'll point to. I'll also be following up with MSN Search directly on this.

The impression I have at the moment is that the system is trained in some way to recognize what is good (trained by algorithms, human choices, I don't know) which in turn uses that data to refine results. It sounds similar to TrustRank, which we've touched on and that I'll be exploring more in the future. But it could also be me misinterpreting the paper.

The Deneuralized Do OK

Yahoo, Ask Jeeves and Google have made no claims to having similar neural net technology, though as Greg noted, a coauthor of the RankNet paper works at Google. How do the others do for that PBS query?

So either as good or practically as good -- and the practically part in Yahoo's case could be argued as good, depending on your particular viewpoint.

New Commands

Aside from new ranking technlogy, the blog post notes new search commands are now being offered by MSN Search. These are:

  • inanchor: The command I've been hoping for, pleading for, lost to the search world since dropped from AltaVista two or three years ago. This is supposed to let you search through anchor text -- in other words, the text of links. Why would you use it? Want to know all the pages that really are linking to the official George W. Bush biography with the words miserable failure in the links? This type of command should let you find them. However, I can't get it to work! inanchor:miserable failure link:http://www.whitehouse.gov/president/gwbbio.html doesn't work, nor does inanchor:miserable failure or even just inanchor:miserable or inanchor:http://www.whitehouse.gov/president/gwbbio.html miserable failure or various other attempts I've tried. I'm following up. In the meantime, the link: command can still be used to sort of do this, though as I've wrtiten before, it's not perfect: Wishing For Better Anchor Text Searching.
     

  • filetype: Lets you find pages of a particular filetype, such as HTML files or Word documents. Though said to be new, I'm fairly certain it's been around for at least a few weeks. It's listed on the advanced search operators page, while the other new commands are not. It wasn't offered when the beta service came out last November.
     

  • inurl: - Lets you find pages that have text within a URL, as opposed to the url: command, which lets you find specific URLs listed in the index.
     

  • intitle: - Lets you find pages that have text within the title tag of a document. For multiple words, it appears to work if you surround the words with quotes (example) or parentheses (example) but not with the words on their own (example and example). If you are after words appearing in the title in no particular order, it seems to work to use the command in front of each word (example).
     

  • linkdomain: - Lets you find all pages that link to anywhere within a particular domain, as opposed to the link: command, which lets you find all pages linking to a particular URL. For example, all links to the US White House (569,343) versus links just to the official George W. Bush biography page (26,414).
     

  • contains: - Supposed to let you find pages with links to documents of a particular filetype. For example, the MSN blog says contains:wma should bring up pages that have links to WMA files. But when I did that search, the pages that came up didn't necessarily seem to have such links, such as this example which ranked second.

When Gary gets in, I'm going to ask him to bang away on the new commands as he tested before at the beta launch. And if search commands seem cool (they are), see C:\> YubNub For "Command Line" Searching & Search Commands For the Majors that guides you to commands that the other major search engine offer.

Want to discuss? Join our forum thread, June 2005 MSN Search Update & Neural Net Tech.