<node id="670854">
  <nid>670854</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1698939984</created>
  <changed>1698939984</changed>
  <title><![CDATA[PhD Defense by  Aran Komatsuzaki]]></title>
  <body><![CDATA[<p><br />
<span><span><strong><span><span><span><span>Title: Improving Foundation Models</span></span></span></span></strong></span></span></p>

<p><span><span><strong><span><span><span><span>Date</span></span></span></span></strong><span><span><span><span>:&nbsp;Tuesday, November 14th</span></span></span></span></span></span></p>

<p><span><span><strong><span><span><span><span>Time</span></span></span></span></strong><span><span><span><span>: 6:30pm EST</span></span></span></span></span></span></p>

<p><span><span><strong><span><span><span><span>Location</span></span></span></span></strong><span><span><span><span>: Zoom:&nbsp;</span></span></span></span><span><span><span><a href="https://gatech.zoom.us/j/96067185652?pwd=MkptcWhRZm5KZ3dpZEQ4ZHpVVlg2dz09">https://gatech.zoom.us/j/96067185652?pwd=MkptcWhRZm5KZ3dpZEQ4ZHpVVlg2dz09</a></span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><span><strong><span><span><span>Aran Komatsuzaki</span></span></span></strong></span></span></span></p>

<p><span><span><span><span><span><span>Machine Learning Ph.D. Student<br />
School of Mathematics<br />
Georgia Institute of Technology<br />
<br />
<strong>Committee</strong><br />
Dr. Heinrich Matzinger (Advisor) - School of Mathematics, Georgia Institute of Technology<br />
Dr. Weinjing Liao - School of Mathematics, Georgia Institute of Technology<br />
Dr. Hannah Choi - School of Mathematics, Georgia Institute of Technology<br />
Dr. Mayya Zhilova - School of Mathematics, Georgia Institute of Technology<br />
Dr. Alexander Lerch - School of Music, Johns Hopkins University<br />
<br />
<strong>Abstract</strong><br />
Foundation models are the family of models (e.g. GPT-4, CLIP) that are trained on a massive dataset and perform various down-streaming tasks, usually with either zero- or few-shot learning, optionally after fine-tuning. This dissertation presents a wide range of important measures we have made to make foundation models more efficient, performant and versatile. In particular, we focus on three points of improvement: architecture, dataset and training. We first present our finding on how to optimally scale language models, which leads to significant performance improvement. We then present GPT-J, one of the earliest open-source large language models. We then show that the performance of ViT and T5, both Transformer-based foundation models, can be greatly improved for a given compute budget using Sparse Upcycling, which is to resume training a sparsely gated model made out of pretrained dense models. We also briefly discuss LAION datasets, massive open-source datasets with around one billion pairs of text and image that are used to train various state-of-the-art multimodal models, and ARB benchmark, a highly challenging benchmark to measure the state-of-the-art LLMs such as GPT-4. On the theoretical side, we prove that feedforward layers of a transformer cannot be compressed without information loss, which may explain the power of sparsely gated models such as mixture-of-experts.</span></span></span></span></span></span></p>

<p>&nbsp;</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Improving Foundation Models]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><strong><span><span><span><span>Improving Foundation Models</span></span></span></span></strong></p>
]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2023-11-14T18:30:00-05:00]]></value>
      <value2><![CDATA[2023-11-14T20:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[Zoom: ]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
