{"62806":{"#nid":"62806","#data":{"type":"news","title":"Georgia Tech Researchers Design Machine Learning Technique to Improve Consumer Medical Searches","body":[{"value":"\u003Cp\u003EMedical\nwebsites like WebMD provide consumers with more access than ever before to\ncomprehensive health and medical information, but the sites\u2019 utility becomes\nlimited if users use unclear or unorthodox language to describe conditions in a\nsite search. However, a group of Georgia Tech researchers have created a machine-learning\nmodel that enables the sites to \u201clearn\u201d dialect and other medical vernacular,\nthereby improving their performance for users who use such language themselves.\u003C\/p\u003E\n\n\n\n\u003Cp\u003ECalled\n\u201cdiaTM\u201d (short for \u201cdialect topic modeling\u201d), the system learns by comparing\nmultiple medical documents written in different levels of technical language.\nBy comparing enough of these documents, diaTM eventually learns which medical\nconditions, symptoms and procedures are associated with certain dialectal words\nor phrases, thus shrinking the \u201clanguage gap\u201d between consumers with health\nquestions and the medical databases they turn to for answers.\u003C\/p\u003E\n\n\n\n\u003Cp\u003E\u201cThe\nlanguage gap problem seems to be the most acute in the medical domain,\u201d said\nHongyuan Zha, professor in the School of Computational Science \u0026amp;\nEngineering and a paper co-author. \u201cProviding a solution for this domain will\nhave a high impact on maintaining and improving people\u2019s health.\u201d\u003C\/p\u003E\n\n\n\n\u003Cp\u003ETo\neducate diaTM in various modes of medical language, Crain and his fellow\nresearchers pulled publicly available documents not only from WebMD but also\nYahoo! Answers, PubMed Central, the Centers for Disease Control \u0026amp;\nPrevention website, and other sources. After processing enough documents, he\nsaid, diaTM can learn that the word \u201cgunk,\u201d for example, is often a vernacular\nterm for \u201cdischarge,\u201d and it can process user searches that incorporate the\nword \u201cgunk\u201d appropriately.\u003C\/p\u003E\n\n\n\n\u003Cp\u003EIn this\ninitial study using small-scale experiments, the researchers found that diaTM\ncan achieve a 25 percent improvement in nDCG (\u201cnormalized discounted cumulative\ngain\u201d), a scientific term that refers to the relevance of information retrieval\nin a web search. Zha, whose research focuses on Internet search engines and\ntheir related algorithms, said a 5 percent improvement in nDCG is \u201cvery\nsignificant.\u201d \u003C\/p\u003E\n\n\n\n\u003Cp\u003E\u201cDiaTM\nfigures out enough language relationships that over time it does quite well,\u201d said\nSteven Crain, Ph.D. student in computer science and lead author of the paper\nthat describes diaTM. \u201cAnother benefit is we\u2019re not doing word-for-word equivalencies,\nso \u2018gunk\u2019 doesn\u2019t necessarily have to be connected to \u2018discharge,\u2019 as long as\nit\u2019s recognized that \u2018gunk\u2019 is related to infections.\u201d\u003C\/p\u003E\n\n\n\n\u003Cp\u003EAlso,\ndiaTM is not limited to medical search; it is a machine-learning technique that\nwould work equally well in any topic-related search. In addition to approaching\nwebsites about incorporating diaTM into their search engines, Crain said one\nnext stop is to develop the model so that it can learn dialects by looking at\npatterns that do not make sense from a topical perspective. For example, using\na similar algorithm he was able to automatically discover dialects including\ntext-speak dialect (e.g. \u201cb4\u201d as a subsititue for \u201cbefore\u201d), but the dialects\nwere mixed in with topically-related groups of words.\u003C\/p\u003E\n\n\n\n\u003Cp\u003E\u201cWe\u2019re trying\nto get to where you can isolate just the dialects,\u201d Crain said.\u003C\/p\u003E\n\n\n\n\u003Cp\u003E\u201cThis\nfeature will help common users of medical websites,\u201d Zha said. \u201cIt will help\nenable consumers with a relatively low level of health literacy to access the\ncritical medical information they need.\u201d\u003C\/p\u003E\n\n\n\n\u003Cp\u003EDiaTM is\ndescribed in the paper, \u201cDialect Topic Modeling for Improved Consumer Medical\nSearch,\u201d to be presented by Crain at the American Medical Informatics\nAssociation Annual Symposium, Nov. 17 in Washington, D.C. Crain\u2019s coauthors\ninclude Hongyuan Zha, professor in the School of Computational Science \u0026amp;\nEngineering; Shuang-Hong Yang, a Ph.D. student in Computational Science and\nEngineering; and Yu Jiao, research scientist at Oak Ridge National Laboratory\n(ORNL). The research was conducted with partial funding from ORNL, Microsoft\nand Hewlett-Packard.\u003C\/p\u003E\n\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":[{"value":"\u2018DiaTM\u2019 can learn vernacular terms for health problems, symptoms"}],"field_summary":[{"value":"\u003Cp\u003EGeorgia Tech researchers have created a machine-learning model that enables\nthe sites, like WebMD, to \u201clearn\u201d dialect and other medical vernacular, thereby improving\ntheir performance for users who use such language themselves. \u003Cem\u003ESource: Office of Communications\u003C\/em\u003E\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"\u2018DiaTM\u2019 can learn vernacular terms for health problems, symptoms."}],"uid":"27174","created_gmt":"2010-11-17 09:48:31","changed_gmt":"2016-10-08 03:07:46","author":"Mike Terrazas","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2010-11-17T00:00:00-05:00","iso_date":"2010-11-17T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"hg_media":{"46038":{"id":"46038","type":"image","title":"Klaus building","body":null,"created":"1449174347","gmt_created":"2015-12-03 20:25:47","changed":"1475894409","gmt_changed":"2016-10-08 02:40:09","alt":"Klaus building","file":{"fid":"190089","name":"tuv62996.jpg","image_path":"\/sites\/default\/files\/images\/tuv62996_0.jpg","image_full_path":"http:\/\/tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/tuv62996_0.jpg","mime":"image\/jpeg","size":40752,"path_740":"http:\/\/tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/tuv62996_0.jpg?itok=UK4jSwpw"}}},"media_ids":["46038"],"groups":[{"id":"1183","name":"Home"}],"categories":[{"id":"153","name":"Computer Science\/Information Technology and Security"}],"keywords":[{"id":"11284","name":"diaTM"},{"id":"398","name":"health"},{"id":"9167","name":"machine learning"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EMichael\nTerrazas\u003C\/p\u003E\n\n\u003Cp\u003EAssistant\nDirector of Communications\u003C\/p\u003E\n\n\u003Cp\u003ECollege\nof Computing at Georgia Tech\u003C\/p\u003E\n\n\n\n\u003Cp\u003E404-245-0707\u003C\/p\u003E","format":"limited_html"}],"email":["mterraza@cc.gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}