{"670854":{"#nid":"670854","#data":{"type":"event","title":"PhD Defense by  Aran Komatsuzaki","body":[{"value":"\u003Cp\u003E\u003Cbr \/\u003E\r\n\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003ETitle: Improving Foundation Models\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EDate\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E:\u0026nbsp;Tuesday, November 14th\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003ETime\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E: 6:30pm EST\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003ELocation\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E: Zoom:\u0026nbsp;\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/96067185652?pwd=MkptcWhRZm5KZ3dpZEQ4ZHpVVlg2dz09\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/96067185652?pwd=MkptcWhRZm5KZ3dpZEQ4ZHpVVlg2dz09\u003C\/a\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EAran Komatsuzaki\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EMachine Learning Ph.D. Student\u003Cbr \/\u003E\r\nSchool of Mathematics\u003Cbr \/\u003E\r\nGeorgia Institute of Technology\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\n\u003Cstrong\u003ECommittee\u003C\/strong\u003E\u003Cbr \/\u003E\r\nDr. Heinrich Matzinger (Advisor) - School of Mathematics, Georgia Institute of Technology\u003Cbr \/\u003E\r\nDr. Weinjing Liao - School of Mathematics, Georgia Institute of Technology\u003Cbr \/\u003E\r\nDr. Hannah Choi - School of Mathematics, Georgia Institute of Technology\u003Cbr \/\u003E\r\nDr. Mayya Zhilova - School of Mathematics, Georgia Institute of Technology\u003Cbr \/\u003E\r\nDr. Alexander Lerch - School of Music, Johns Hopkins University\u003Cbr \/\u003E\r\n\u003Cbr \/\u003E\r\n\u003Cstrong\u003EAbstract\u003C\/strong\u003E\u003Cbr \/\u003E\r\nFoundation models are the family of models (e.g. GPT-4, CLIP) that are trained on a massive dataset and perform various down-streaming tasks, usually with either zero- or few-shot learning, optionally after fine-tuning. This dissertation presents a wide range of important measures we have made to make foundation models more efficient, performant and versatile. In particular, we focus on three points of improvement: architecture, dataset and training. We first present our finding on how to optimally scale language models, which leads to significant performance improvement. We then present GPT-J, one of the earliest open-source large language models. We then show that the performance of ViT and T5, both Transformer-based foundation models, can be greatly improved for a given compute budget using Sparse Upcycling, which is to resume training a sparsely gated model made out of pretrained dense models. We also briefly discuss LAION datasets, massive open-source datasets with around one billion pairs of text and image that are used to train various state-of-the-art multimodal models, and ARB benchmark, a highly challenging benchmark to measure the state-of-the-art LLMs such as GPT-4. On the theoretical side, we prove that feedforward layers of a transformer cannot be compressed without information loss, which may explain the power of sparsely gated models such as mixture-of-experts.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EImproving Foundation Models\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"Improving Foundation Models"}],"uid":"27707","created_gmt":"2023-11-02 15:46:24","changed_gmt":"2023-11-02 15:46:24","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2023-11-14T18:30:00-05:00","event_time_end":"2023-11-14T20:00:00-05:00","event_time_end_last":"2023-11-14T20:00:00-05:00","gmt_time_start":"2023-11-14 23:30:00","gmt_time_end":"2023-11-15 01:00:00","gmt_time_end_last":"2023-11-15 01:00:00","rrule":null,"timezone":"America\/New_York"},"location":"Zoom: ","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}