{"537711":{"#nid":"537711","#data":{"type":"news","title":"SEISE Tool Uses Semantic Gaps to Detect Website Promotional Attacks","body":[{"value":"\u003Cp\u003EBy detecting semantic inconsistencies in content, researchers have developed a new technique for identifying promotional infections of websites operated by government and educational organizations. Such attacks use code embedded in highly-ranked sites to drive traffic to sketchy websites selling fake drugs, counterfeit handbags and plagiarized term papers \u2013 or installing drive-by malware.\u003C\/p\u003E\u003Cp\u003EThe new technique, known as Semantic Inconsistency Search (SEISE), uses natural language processing to spot the differences between a compromised site\u2019s expected content and the malicious advertising and promotional code. Using SEISE, the researchers found 11,000 infected sites among non-commercial top-level sponsored .edu, .gov and .mil domains worldwide, and are working to extend the method to other domains.\u003C\/p\u003E\u003Cp\u003EThe research was supported by the U.S. National Science Foundation and Natural Science Foundation of China. It will be described in a presentation May 25, 2016 at the IEEE Symposium on Security and Privacy in San Jose. SEISE was developed by researchers from the Georgia Institute of Technology, Indiana University and Tsinghua University in China.\u003C\/p\u003E\u003Cp\u003E\u201cThe basic idea behind promotional infection is to attack websites that are highly-ranked and to leverage their importance to promote various things, most of them illegal,\u201d explained \u003Ca href=\u0022https:\/\/www.ece.gatech.edu\/faculty-staff-directory\/abdul-r-beyah\u0022\u003ERaheem Beyah\u003C\/a\u003E, who is the Motorola Foundation Professor in Georgia Tech\u2019s \u003Ca href=\u0022http:\/\/www.ece.gatech.edu\/\u0022\u003ESchool of Electrical and Computer Engineering\u003C\/a\u003E. \u201cThe bad content is nested into the prominent site to leverage the traffic of that domain. That gives the attackers a doorway to whatever they are promoting.\u201d\u003C\/p\u003E\u003Cp\u003EEssentially, said Beyah, the attackers are stealing the site\u2019s good name, even if they don\u2019t install malware or otherwise inflict harm on web visitors.\u003C\/p\u003E\u003Cp\u003E\u201cThe attackers essentially become part of the prominent website\u2019s brand and share in the ranking they have,\u201d he added. \u201cIt\u2019s like setting up operations inside a well-known coffee shop chain. The attacker leverages the brand by becoming co-located with it.\u201d\u003C\/p\u003E\u003Cp\u003EThe promotional attacks can be difficult to detect, especially if they don\u2019t contain malicious computer code. But the semantic differences between the host site and the attacker\u2019s code can tip off the SEISE algorithm. Once it has characterized the content expected on a website \u2013 educational information on an .edu page, for example \u2013 the pitches for gambling or inexpensive prescription drugs become obvious.\u003C\/p\u003E\u003Cp\u003E\u201cIf you are visiting the website for a prestigious university, you don\u2019t expect to see information promoting casino gambling,\u201d said Beyah. \u201cIf we expect one thing from the website and see something significantly different, there is a huge semantic gap that we can detect.\u201d\u003C\/p\u003E\u003Cp\u003ESEISE doesn\u2019t have to review an entire site to determine what should be there; it can sample the pages to learn context that makes attacker terms stand out. Because their domain purposes are clear and well established, the researchers began with education and government websites. They now hope to extend the automated approach to commercial and other domains whose intended purposes may be less consistent.\u003C\/p\u003E\u003Cp\u003E\u201cWe are trying to figure out how to get the context right for these domains so we can help companies detect these infections,\u201d Beyah said. \u201cThere\u2019s no reason to believe that the commercial domains are any less attractive to attackers than the non-commercial ones.\u201d\u003C\/p\u003E\u003Cp\u003EBeyah and Georgia Tech Ph.D. student Xiaojing Liao began the work by using Google searches to find sites with known \u201cbad words\u201d denoting illicit products. They then utilized natural language processing to find terms associated with these known bad words, which were then used to train the SEISE before it was sent out to analyze 100,000 domains for the presence of the illicit terms. The approach identified 11,000 infected sites with a false detection rate of just 1.5 percent and coverage of more than 90 percent.\u003C\/p\u003E\u003Cp\u003ESEISE found promotional infections on the websites of top U.S. universities and government agencies, though the problem was truly worldwide, with three percent of .edu and .gov sites infected. Of the infected websites noted, 15 percent were in China and six percent were in the United States.\u003C\/p\u003E\u003Cp\u003ESites are infected using proven attack techniques such as SQL injection, URL redirection and phishing to compromise the credentials of users, Beyah said. Though central websites of the organizations may be secure, pages of individual users and units may be more vulnerable \u2013 and still provide the prestige of the overall domain.\u003C\/p\u003E\u003Cp\u003EExisting techniques for detecting promotional infections rely on examining redirects and following links, or observing how sites change over time. But those techniques aren\u2019t scalable and can\u2019t be automated in the same way as the new semantic gap approach, Beyah said.\u003C\/p\u003E\u003Cp\u003EThe researchers want to share their technique with the larger security community, and are discussing how best to make the algorithm available. \u201cOur study shows that by effective detection of infected sponsored top-level domains (sTLDs), the bar to promotion infections can be substantially raised,\u201d the authors wrote in their paper.\u003C\/p\u003E\u003Cp\u003EAbout those 11,000 compromised webpages? The researchers are attempting to contact the operators of all 11,000 of them to share the bad news. \u201cWe have spent a lot of time contacting those folks and letting them know what we have found,\u201d Beyah said. \u201cWe\u2019re still in the process of doing that because there are so many.\u201d\u003C\/p\u003E\u003Cp\u003E\u003Cem\u003EThis work was supported by the National Science Foundation through Grants CNS-1223477, CNS-1223495 and CNS-1527141 and by the Natural Science Foundation of China through Grant 61472215. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.\u003C\/em\u003E\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EResearch News\u003C\/strong\u003E\u003Cbr \/\u003E\u003Cstrong\u003EGeorgia Institute of Technology\u003C\/strong\u003E\u003Cbr \/\u003E\u003Cstrong\u003E177 North Avenue\u003C\/strong\u003E\u003Cbr \/\u003E\u003Cstrong\u003EAtlanta, Georgia 30332-0181 USA\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EMedia Relations Contact\u003C\/strong\u003Es: John Toon (\u003Ca href=\u0022mailto:jtoon@gatech.edu\u0022\u003Ejtoon@gatech.edu\u003C\/a\u003E) (404-894-6986) or Ben Brumfield (\u003Ca href=\u0022mailto:ben.brumfield@comm.gatech.edu\u0022\u003Eben.brumfield@comm.gatech.edu\u003C\/a\u003E) (404-385-1933).\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EWriter\u003C\/strong\u003E: John Toon\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EBy detecting semantic inconsistencies in content, researchers have developed a new technique for identifying promotional infections of websites operated by government and educational organizations. Such attacks use code embedded in highly-ranked sites to drive traffic to sketchy websites selling fake drugs, counterfeit handbags and plagiarized term papers \u2013 or installing drive-by malware.\u0026nbsp;\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Researchers have developed a new technique for identifying promotional infections of websites operated by government and educational organizations."}],"uid":"27303","created_gmt":"2016-05-19 13:19:51","changed_gmt":"2016-10-08 03:21:42","author":"John Toon","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2016-05-19T00:00:00-04:00","iso_date":"2016-05-19T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"537681":{"id":"537681","type":"image","title":"Researchers with code promoting essays","body":null,"created":"1464282000","gmt_created":"2016-05-26 17:00:00","changed":"1475895324","gmt_changed":"2016-10-08 02:55:24","alt":"Researchers with code promoting essays","file":{"fid":"88892","name":"promo-infection_3289.jpg","image_path":"\/sites\/default\/files\/images\/promo-infection_3289.jpg","image_full_path":"http:\/\/tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/promo-infection_3289.jpg","mime":"image\/jpeg","size":1397524,"path_740":"http:\/\/tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/promo-infection_3289.jpg?itok=dN2bR6Zg"}},"537661":{"id":"537661","type":"image","title":"Map of worldwide promotional infections","body":null,"created":"1464282000","gmt_created":"2016-05-26 17:00:00","changed":"1475895324","gmt_changed":"2016-10-08 02:55:24","alt":"Map of worldwide promotional infections","file":{"fid":"88890","name":"geolocation.jpg","image_path":"\/sites\/default\/files\/images\/geolocation.jpg","image_full_path":"http:\/\/tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/geolocation.jpg","mime":"image\/jpeg","size":394283,"path_740":"http:\/\/tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/geolocation.jpg?itok=MY6JaVCj"}}},"media_ids":["537681","537661"],"groups":[{"id":"1188","name":"Research Horizons"}],"categories":[{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"135","name":"Research"}],"keywords":[{"id":"1404","name":"Cybersecurity"},{"id":"170290","name":"promotional attack"},{"id":"67741","name":"Raheem Beyah"},{"id":"172044","name":"SEISE"},{"id":"170291","name":"semantic gap"},{"id":"172045","name":"semantic inconsistency"},{"id":"110271","name":"website"}],"core_research_areas":[{"id":"145171","name":"Cybersecurity"}],"news_room_topics":[{"id":"71881","name":"Science and Technology"},{"id":"71901","name":"Society and Culture"}],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EJohn Toon\u003C\/p\u003E\u003Cp\u003EResearch News\u003C\/p\u003E\u003Cp\u003E\u003Ca href=\u0022mailto:jtoon@gatech.edu\u0022\u003Ejtoon@gatech.edu\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E(404) 894-6986\u003C\/p\u003E","format":"limited_html"}],"email":["jtoon@gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}