NOTE: New information on the MSN crawler has been updated into the Crawler Info section since this page was posted.
Microsoft released a public preview of its long-awaited web search technology today, over a year after first embarking on the project. The company also gave a facelift to its popular MSN Search site that remains powered by Yahoo's search technology and dropped paid inclusion listings there.
None of the moves are groundbreaking. The beta search technology shows glitches common to any new web search engine that only get worked out over time. Microsoft's search engine isn't a serious replacement for Google, Yahoo or Ask Jeeves yet. Meanwhile, changes at MSN Search merely bring the company in line with the look and feel of Google and the forthcoming new results look from Yahoo.
Microsoft itself describes its new search technology as "raw" and admits that for various reasons, it won't do well on some queries. Nevertheless, it is an important start. Microsoft says it sees search as a tough technology problem to take on and solve over the next five to ten years.
"We're humble about what needs to be done, but we're very excited about it," said Yusuf Mehdi, corporate vice president of MSN. "The first step is getting our own technology out there."
Search Technology Preview
The new Microsoft search engine is best reached via its MSN Sandbox page. There, you'll find links to the MSN Search Technology Preview designed to serve the US and world in general. A UK-specific version is also offered, as are many others worldwide.
A few test queries show ranking differences at some of these sites, such as if you compare a search for cars at the US site to the UK one. But the differences don't appear to be regional in nature. In other words, the UK results don't feel more oriented to the UK than the US results.
Microsoft says the new search engine has about 1 billion pages indexed, with plans to increase this size over time. That puts it behind the size of other major search engines, though it's important to always remember that size is only one of many factors that influences how good a search engine is.
"We're smaller than the rest of the indexes for the moment, so I think you'll see that on some queries, we'll do a decent job. On some we won't, because we don't even have the documents," Mehdi said.
How does it measure up? Relevancy is an extremely difficult task to assess, as I've written about before. To do it properly, you should run a battery of tests. In addition, measuring how this preview service operates is largely a waste of time. It lacks some key features that mature search engines offer that impact relevancy, which almost certainly will be added over the coming months.
Having said that, I did want to get some type of feel. I pulled a few queries below from our old Perfect Page Test we did in 2002 to produce a very quick, very rough assessment.
I can't stress enough that these results don't indicate how good or bad the Microsoft search technology is in comparison to competitors overall. But they do give you a feel for some of the challenges and problems MSN will need to correct.
Cable Ship Restorer: Mixed review. A big thumbs up for listing the home page of a good site about this subject in the top results with a nice description. Yahoo also lists the home page, but ironically, gives it a bad description because of a coding error in that site's meta description tag. Google lists an inside page, and Ask Jeeves doesn't list it at all.
However, the new Microsoft search engine lacks clustering, the idea that you only show one or two top results from any single web site. Clustering helps ensure variety in results. With Microsoft, 9 out of the top 15 results -- 60 percent of the listings -- come from the same site. At Yahoo, only 10 percent come from the same site. At Google, it's 20 percent.
Wemyss Bay: Fairly good. Eyeballing the results here showed them fairly decent in comparison to Yahoo and Google. All of them seemed related to this location in some way. Interestingly, Ask Jeeves actually fell down in having two listings (Ace-Chase and AceInternet) that had nothing to do with Wemyss Bay in their descriptions and indeed led to expired pages. I'd previously seen what I also thought was a failure to detect duplicate pages in this query, but that was a mistake on my part. The two pages were very similar but definitely not identical, it turns out.
Cop Jokes: Poor performance here, primarily because a lack of clustering means pages from only a few sites dominate the results. The results themselves don't seem that spectacular, but neither do many of the listings at Yahoo and Google based on a quick eyeball of descriptions. Ask Jeeves results did seem a bit better here.
Genealogy: Mixed review. A quick look shows sites that seem fairly decent, but looking elsewhere shows stumbles. For example, Microsoft shows a page about the genealogy of the surname "kerrigan." That's pretty specific. Google, Yahoo and Ask Jeeves avoid going down this low. Moreover, the range of name brand sites is better. Places like Ancestry.com or the well-regarded Cyndi's List and other seemingly large, good sites are more in evidence.
Car Prices: Mixed review. The preview gets good sites like the NADA guides and Edmunds that others also find, but the others do a better job getting these higher in the results. The top result of autonetclassifieds, with its all lower case title and clear SEO-heavy intent, is off-putting. Other examples like this also appear.
Paper Toys: Mixed review. Again, it's an issue of clustering being needed to help eliminate so many of the pages coming from paper-toys.com. But there are clearly some good sites. And clustering is an issue at Yahoo, where eBay gets represented three times, twice through paid inclusion.
US Patents: Fairly good. The US Patent & Trademark Office comes up tops, as is the case with competitors. The lack of clustering is not an issue, exposing a good variety of what seem to be decent resources.
Travel: Ho hum. It's a mish-mash of sites I've mostly never heard of. Competitors give good starting places like Lonely Planet, Travelocity, Fodors, Frommers, Expedia, Travelocity, Priceline and Yahoo Travel.
Overall, I found the search engine is a good first effort. Clustering is desperately needed. There's a sense that the ranking system doesn't do quite as good of job as getting solid authority sites to the top of the list, and that it may be more susceptible to search engine optimization tricks. Much of this is relatively easy to correct, nor a surprise to see in the debut of a new service.
The new search engine also leaves me with a "more of the same" feeling. It doesn't take search results anything beyond what Yahoo, Google or Ask Jeeves already do, and given their maturity, do better. In fact, a fast run of the tested queries above through Gigablast -- a one man effort by Matt Wells -- makes you think MSN still needs to catch up to even that service.
The new Microsoft search engine is NOT -- NOT NOT NOT -- being used at MSN Search. It can be confusing, because along with the search technology announcement, Microsoft has also announced a new look and feel for its MSN Search site. Despite these cosmetic changes, under the hood, MSN Search itself still beats with a Yahoo heart.
Perhaps half a heart is a better description. MSN Search continues to show a significant difference in the number of results found in comparison to the same queries at Yahoo, as discussed recently in the SEW Forums. Is MSN hitting less than the full Yahoo database? The company wouldn't comment about this. But in all likelihood, this is what's happening.
For searchers, this means that for some relatively obscure queries, you might not find some pages that have answers to your question. But on many other queries, it may make no difference.
The cosmetic changes at MSN are in line with what the service already said it would be doing back in March. See my past article, New Look In July, New Search Engine Later, Says MSN, for a deeper look at some of these alterations.
In summary, sponsored listings now appear in boxes above and to the side of editorial results. The confusing "Featured Sites" area that often contained ads is also gone. The result is to help more editorial results rise to the top of the results, which MSN says they've found improves perceived relevancy.
On the home page, the LookSmart-powered directory is now gone. That leaves the page nearly blank, making it much more Google-like in being clean. A drop down box to the right of the search box provides access to web, news, dictionary, encyclopedia, stock quote, movie and shopping search.
The return of a drop down box on a major search service is nothing new and if past history is a judge, likely to be just as ignored as tabs have been. Lycos had drop-down boxes in 1998 just like the one now at MSN, and other search engines tried them as well. They weren't used much.
Non-use of the drop-down box isn't a problem as long as MSN has other ways of revealing data within the results themselves, something I've labeled in the past as invisible tabs. MSN has done this before, and it continues now at least for news, encyclopedia and apparently some travel queries.
Do a search for iraq, for example, and you'll see Moreover-powered news results showing up above the web search results. Search for galaxy, and a Microsoft Encarta definition and article links appear before the web search material. By the way, this type of insertion is called clips search results by MSN.
News search on the US and other sites should shift to MSN's own Newsbot service in the near future, the MSN says. That's already the case in the UK and on some other non-US/English language sites, where Newsbot was released last year.
Underneath the hood, the most significant change is MSN's decision to drop paid inclusion listings. Search Engine Watch reported last March that this might happen, and now it has panned out.
The move follows on the Ask Jeeves announcement last week that they were entirely dropping paid inclusion listings. As Google has never offered paid inclusion, this leaves Yahoo as the last major service still offering it.
Dropping paid inclusion helps MSN avoid all the bad publicity that Yahoo had to endure when it rolled out an updated paid inclusion program on the heels of releasing its own new search technology. It also avoids the mixed messages and possible consumer confusion that paid inclusion can generate.
"The biggest reason we removed it is the user perception that there's something bad," Mehdi said. "Yahoo has been a big fan of paid inclusion because they believe it helps relevancy, but it wasn't enough for us to do something different for now."
That leaves the door open that paid inclusion might return in the future. It should also be noted that both MSN, Yahoo and Ask Jeeves still have paid inclusion that operates in other types of searches, in particular product and yellow page searches.
In a future part of my series on paid inclusion, I'll be looking more at this and how when it comes to specialized search, paid inclusion may be more acceptable to some.
Some site owners have already been in limbo wondering what to do about listings on MSN Search. Though the service broke with LookSmart earlier this year, it still would sometimes use those listings. Those the service uses Yahoo data, the results at MSN itself are ranked differently than at Yahoo.
My advice would be to sit tight. Trying to make specific improvements to please MSN Search will largely be a waste of time. Microsoft's own search technology should be in place on MSN by the end of the year. Thus, pleasing the particular algorithm used to sort Yahoo results currently shown at MSN will be short lived.
Instead, look to the new search technology preview as what's to come from Microsoft and what to plan for. However, I still wouldn't advise taking many specific actions. Some are already trying to dissect what seems to please this new Microsoft search engine, in hopes of gaining some advantages. However, they're aiming at a moving target. That service is going to continue to be radically shaped and tweaked over the coming months.
Overall, I would to continue to focus on the key things that have traditionally helped with search engines overall: good content, good titles, relevant HTML copy and building decent links.
How about just getting into the new search engine. So far, there's no add URL page that's been added, so as with Ask Jeeves/Teoma, you have to rely on being crawled naturally. MSN says that an add URL page will be added later this year.
What's the crawl and update schedule like? MSN says it can't yet say what's the worst case scenario as to how old a page might be, how often pages can expect to be visited and so on. Likely as the service matures, we'll probably see some definitive answers here.
At the existing MSN Search service, the Submit a Site link on the results pages, in the top right corner, is a waste of time -- merely a pitch to buy Overture. As for this submit URL that some may remember, it submits your page to the Yahoo crawler, which powers the current MSN Search, MSN says.
What gets indexed? MSN says all visible text with the exception of ALT text. However, this could change in the future, they warn. The are no stop words, so every page is supposed to be indexed and be searchable.
Interestingly, MSN says it can follow frame links. I've yet to test this, but if so, I think it would make it the only major crawler that can spider frame content without the webmaster having to make special links to help.
Meta tags? Meta keywords and description aren't supported, yet.
Some limited webmaster-oriented information is available about the service is offered here. That page links to a guidelines page with one particular aspect that may sound worrisome -- advice that an HTML page should be no larger than 150KB.
This suggests that pages larger than this may not be indexed. I suspect that what's likely the case is that only up to this amount may get indexed. Google has a similar limit of about 100 KB. Pages in excess of this do get indexed, but text that appears beyond that limit is not recorded.
MSN didn't confirm whether text over 150KB wouldn't be indexed. However, they said the size reference was only a "recommendation," suggesting you needed worry if you have a few key pages larger than this. They also said the crawler can handle dynamic URLs.
Some undocumented support of a crawling delay time that can be set for the new MSN crawler has also been reported via WebmasterWorld.com at this thread, from what appears to be an MSN engineer posting there. Microsoft is also promoting discussion of the new technology within one of its newsgroups, though I've not seen much of anything search related, so far.
MSN has termed this as the most significant search upgrade in its history. Having watched and written about MSN Search's upgrades over the years, it doesn't feel that way to me. The service has constantly undergone cosmetic changes like these in the past, including those that were previously said to speed up load time and increase relevancy. The promotion of Encarta data is not new to this release. Encarta data was added back in 2001.
In my view, far more significant upgrades have happened under the hood in the past. The service largely stopped using LookSmart data earlier this year. The service also, somewhat sadly, seems to have abandoned last year its method of producing quality human edited results for key queries. That was a unique strength it had against its competition that's now lost.
In many ways, MSN Search is in a holding pattern until it gets a heart transplant of Microsoft's own search technology later this year, a time Microsoft chairman Bill Gates has stated. At that time, there will almost certainly be other changes and capabilities to the site's advanced search page or in how it operates.
The new search technology itself, not yet part of MSN Search, is significant. It's Microsoft's first real weapon of its own in the ongoing search wars. But that technology is not superior to others nor an advance in the state of web search, not yet.
What about some recent statements by Gates about linguistic analysis as a way forward? They make a nice sound bite, which is why would-be search companies have said the same things in the past. But such efforts have gone nowhere. In my view, this has been primarily because linguistic analysis of pages or natural language processing isn't that important when dealing with the popular, short queries people conduct like "britney spears."
Instead, what's far more needed is a way to rate the authority or popularity of a document. Link analysis has been the leading method of choice for this, but its usefulness has been continually whittled down as site owners have become far more conscious of how they link.
Instead, personalization and the emergence of invisible tabs/specialty search are widely seen as the leading ways forward to a new generation of search. MSN Search isn't offering personalization now and barely much of invisible tabs. Its competitors are further along that path, something Gates was either unaware of or chose to overlook in a recent assessment of search challenges.
That may change down the line, of course. Microsoft knows it still has a big challenge ahead of it. At least now, it's publicly in the game with a search product of its own.
"All of these things [advancing search”, we think are tough software problems, and we're a software company, so that fits with our DNA," Mehdi said. "This provides the foundation that helps us get to the next generation of problems."
What do you think about the new search technology? Share your thoughts on our forums in this thread: MSN Backend Sneak Peak. How about the new look to MSN Search? Come discuss that in this thread: Behold the new face of MSN Search.
Introducing SES Online
Want to view one of the sessions you missed or listen to an especially informative presenter a second time? SES New York sessions are available for purchase on ClickZ Academy's new e-Learning site. SES is now Online!