<node id="671015">
  <nid>671015</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1699650600</created>
  <changed>1699650600</changed>
  <title><![CDATA[PhD Proposal by Yang Chen ]]></title>
  <body><![CDATA[<p><span><span><strong><span><span>Title: </span></span></strong><span><span>Benchmarking Multilingual and Multimodal Intelligent Systems</span></span></span></span></p>

<p><span><span><strong><span><span>Date/Time</span></span></strong><span><span>: Nov 17, 2023, 3:00 PM to 5:00 PM Eastern Time (US)</span></span></span></span></p>

<p><span><span><strong><span><span>Location</span></span></strong><span><span>: <a href="https://gatech.zoom.us/j/98760293352?pwd=bjB6ekNzWEI1bVNiWkxZb1hkckZsUT09" title="https://gatech.zoom.us/j/98760293352?pwd=bjB6ekNzWEI1bVNiWkxZb1hkckZsUT09">Zoom Link</a></span></span></span></span></p>

<p><span><span><span><span>Meeting ID: 987 6029 3352</span></span></span></span></p>

<p><span><span><span><span>Passcode: 443653</span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><strong><span><span>Yang Chen </span></span></strong><span><span>(<a href="https://edchengg.github.io/" title="https://edchengg.github.io/">Homepage</a>)</span></span></span></span></p>

<p><span><span><span><span>Ph.D. Candidate in Computer Science</span></span></span></span></p>

<p><span><span><span><span>School of Interactive Computing</span></span></span></span></p>

<p><span><span><span><span>Georgia Institute of Technology</span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><strong><span><span>Committee:</span></span></strong></span></span></p>

<p><span><span><span><span>Dr. Alan Ritter (advisor), School of Interactive Computing, Georgia Tech</span></span></span></span></p>

<p><span><span><span><span>Dr. Wei Xu (co-advisor), School of Interactive Computing, Georgia Tech</span></span></span></span></p>

<p><span><span><span><span>Dr. Kartik Goyal, School of Interactive Computing, Georgia Tech</span></span></span></span></p>

<p><span><span><span><span>Dr. Hexiang (Frank) Hu, Google Deepmind</span></span></span></span></p>

<p><span><span><span><span>Dr. Ming-Wei Chang, Google Deepmind</span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><strong><span><span>Abstract:</span></span></strong></span></span></p>

<p><span><span><span><span>Language serves as the cornerstone and medium to transfer human intellect across communities worldwide. Recent developments of large language models that consume vast amounts of human knowledge from large-scale online text corpora have revolutionized the field of natural language processing (NLP) and serve as the building blocks to build intelligent systems that benefit humanity.</span></span></span></span></p>

<p><span><span><span><span>However, two primary challenges are present: 1) the significant resource imbalance among languages, influenced by the disparities in the wealth of resources across different countries, cultures, and geographic regions, diminishes the efficacy of language models in understanding and serving speakers of low-resource languages; 2) the language-only modality may limit the model to acquire knowledge and how it could broaden the domain of applications to help human with visual inefficiency or facility people to interact with the visual environment.</span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><span><span>This thesis proposal aims to benchmark the two challenges in the pursuit of building reliable multilingual and multimodal intelligent systems that benefit humanity.</span></span></span></span></p>

<p><span><span><span><span>In the first part of the presentation, I present methods that I developed to improve language model understanding on low-resource languages, including a synthetic data generation model and a novel algorithm that translates and fusion annotations from high-resource languages. In the second part of the presentation, I introduce the InfoSeek benchmark (1M+), which assesses the capabilities of vision-language models to answer visual information-seeking questions about entities present in an image. By benchmarking multimodal large language models and retrieval-augmented generation models on InfoSeek, we show insights that benefit the future development of multimodal intelligent systems.</span></span></span></span></p>

<p>&nbsp;</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[ Benchmarking Multilingual and Multimodal Intelligent Systems]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><span><span><span>Benchmarking Multilingual and Multimodal Intelligent Systems</span></span></span></p>
]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2023-11-17T15:00:00-05:00]]></value>
      <value2><![CDATA[2023-11-17T17:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[ZOOM]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>102851</tid>
        <value><![CDATA[Phd proposal]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
