Researchers at Georgia Tech have developed a machine-learning technique that gathers and stores word associations to enhance the search process. The technique uses, from what I understand of it, a form of keyword or semantic latency to associate terms and long tail searches to provide better search results.
The process is called diaTM - for dialect topic modeling - and is being initially tested in the medical space where there are many ways of searching for symptoms etc.
"DiaTM figures out enough language relationships that over time it does quite well," said Steven Crain, Ph.D. student in computer science and lead author of the paper that describes diaTM. "Another benefit is we're not doing word-for-word equivalencies, so 'gunk' doesn't necessarily have to be connected to 'discharge,' as long as it's recognized that 'gunk' is related to infections.," the Georgia Tech newsroom reported.
"The system learns by comparing multiple medical documents written in different levels of technical language. By comparing enough of these documents, diaTM eventually learns which medical conditions, symptoms and procedures are associated with certain dialectal words or phrases, thus shrinking the "language gap" between consumers with health questions and the medical databases they turn to for answers."
The researchers said this technique can be applied to any area and that so far they have seen a 25% increase in their information retrieval. They even believe it can be applied to text speak (the use of b4 for before etc.).
Interestingly Microsoft was one of the group that provided funding for the research.