Spell checking your queries has been a part of the search experience for a while. Auto-correcting your spelling mistakes is almost expected nowadays. Recently Bing gave a sneak peak of what goes into making a great spell checking engine and how it makes search feel like magic to its users.
In a recent post, Dr. Jim Kleban, Bing R&D Program Manager explained that Bing’s Speller processes tens of millions of data points that are mined from search queries, clickstream data, user actions and indexed web pages when it tried to guess what you’re typing into Bing. Processing all those millions of data points and correcting misspellings or entire queries needs to happen in microseconds to be able to pass the new query to the actual search engine.
According to Bing’s data, their speller processes “tens of thousands” of queries every second and algorithmically processes and returns the corrections within “tens of milliseconds.”
When attempting to unravel the mystery of a misspelling, Bing’s speller works context clues in the query. Using data models, an algorithm attempts to look at all the words in a search query to figure out what the misspelled word should be.
Bing’s algorithm also takes into account edit distance, what Kleban describes as the difference of individual letters of two distinct search queries. In its most common form, edit distance spelling errors occur when searchers attempt to type a word phonetically, as if they were pronouncing it.
For example: [nickteen withdraw]
Kleban explains that the misspelling of “nickteen” might actually be two words, “Nick” and “Teen”, possibly referring to the kids’ TV network and its mid-afternoon programming geared toward teenagers. However, the use of the word “withdraw” – with help from a quick peek at Bing query logs – provides better context that the misspelled word might actually be nicotine.
But spelling correction doesn’t stop with misspelled words. Often, Bing explains, each word in a query is spelled correctly, but the actual word’s context is incorrect.
For example: [how can you sea if money is reel]
In this example, Bing’s speller needs to understand the difference between “sea” and “see” and “reel” and “real.” As depicted in the screen shots, there is clearly a big difference in results if the misused words are not corrected.
Kleban explains staying current with new words and trends is vital to Bing’s spell checking. Internet memes, names of people or companies popularized by news headlines and even new slang words or other regional vernacular all have to be accounted for.
Frankly, unless you grew up just south of the Han River, who really knew how to spell Gangnam when the video first dropped? Yet searchers fully expect their engines to work magic and simply find what they want.