<node id="659850">
  <nid>659850</nid>
  <type>news</type>
  <uid>
    <user id="35403"><![CDATA[35403]]></user>
  </uid>
  <created>1659451086</created>
  <changed>1659451086</changed>
  <title><![CDATA[Latest NLP Research Derives Insight from Growing Volume of Digital Text]]></title>
  <body><![CDATA[<p>New NLP research from Georgia Tech is allowing for patterns to be uncovered in this text and broaden the understanding of how to build better computer applications that derive value from written language.</p>

<p>Georgia Tech researchers are presenting their latest work at the annual conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022), taking place this week, July 10-15. NAACL provides a regional focus for members of the Association for Computational Linguistics (ACL) in North America as well as in Central and South America and promotes cooperation and information exchange among related scientific and professional societies.</p>

<p>&ldquo;Recent advances in natural language processing &shy;&shy;&shy;&shy;&ndash; especially around big models &ndash; have enabled successful applications,&rdquo; said&nbsp;<strong>Diyi Yang</strong>, assistant professor in the School of Interactive Computing and researcher in NLP. &ldquo;At the same time, we see a growing amount of evidence and concern toward the negative aspects of NLP systems, such as the bias and fragility exhibited by these models, as well as the lack of input from users.&rdquo;</p>

<p>Yang&rsquo;s work in computational social science and NLP focuses on how to understand human communication in social context&nbsp;and build&nbsp;socially aware&nbsp;language technologies to support human-to-human and human-computer interaction.</p>

<p>Her SALT Lab has accrued an impressive number of innovations in the field over the past eight months, starting with research presented at last November&rsquo;s EMNLP conference. SALTers, as they are called, led Georgia Tech to become the top global contributor in computational social science and cultural analytics at that venue. The 60th&nbsp;Meeting of the ACL in Dublin followed in May with multiple SALT studies, including a best paper. Yang&rsquo;s group has six papers at this week&rsquo;s NAACL.</p>

<p>&ldquo;We hope to build NLP systems that are more user centric, more robust, and more aware of human factors,&rdquo; said Yang. &ldquo;Our NAACL works are in this direction, covering robustness, toxicity detection, and generalization to new settings.&rdquo;</p>

<p>Yang&rsquo;s aspirations for the field are shared by her Tech peers, who have work in the following tracks at NAACL:</p>

<ul>
	<li>Ethics, Bias, Fairness</li>
	<li>Information Extraction</li>
	<li>Information Retrieval</li>
	<li>Interpretability and Analysis of Models for NLP</li>
	<li>Machine Learning</li>
	<li>Machine Learning for NLP</li>
	<li>Semantics: Sentence-level Semantics and Textual Inference</li>
</ul>

<p>Georgia Tech&rsquo;s research paper acceptances in the main program at NAACL are below. To learn more about NLP and machine learning research at Georgia Tech visit&nbsp;<a href="https://ml.gatech.edu/">https://ml.gatech.edu</a>.</p>

<h4><strong>GEORGIA TECH RESEARCH AT NAACL 2022</strong>&nbsp;(main papers program)</h4>

<p><strong>Ethics, Bias, Fairness</strong></p>

<p>Explaining Toxic Text via Knowledge Enhanced Text Generation<br />
<em>Rohit Sridhar, Diyi Yang</em></p>

<p><strong>Information Extraction</strong></p>

<p>Self-Training with Differentiable Teacher<br />
<em>Simiao Zuo, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang, Tuo Zhao, Hongyuan Zha</em></p>

<p><strong>Information Retrieval</strong></p>

<p>CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data<br />
<em>Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang</em></p>

<p><strong>Interpretability and Analysis of Models for NLP</strong></p>

<p>Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models<br />
<em>Tianlu Wang, Rohit Sridhar, Diyi Yang, Xuezhi Wang</em></p>

<p>Measure and Improve Robustness in NLP Models: A Survey<br />
<em>Xuezhi Wang, Haohan Wang, Diyi Yang</em></p>

<p>Reframing Human-AI Collaboration for Generating Free-Text Explanations<br />
<em>Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, Yejin Choi</em></p>

<p><strong>Machine Learning</strong></p>

<p>AcTune: Uncertainty-Aware Active Self-Training for Active Fine-Tuning of Pretrained Language Models<br />
<em>Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, Chao Zhang</em></p>

<p>MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation<br />
<em>Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen</em></p>

<p><strong>Machine Learning for NLP</strong></p>

<p>TreeMix: Compositional Constituency-based Data Augmentation for Natural Language Understanding<br />
<em>Le Zhang, Zichao Yang, Diyi Yang</em></p>

<p><strong>NLP Applications</strong></p>

<p>Cryptocoin Bubble Detection: A New Dataset, Task &amp; Hyperbolic Models<br />
<em>Ramit Sawhney, Shivam Agarwal, Vivek Mittal, Paolo Rosso, Vikram Nanda, Sudheer Chava</em></p>

<p><strong>Semantics: Sentence-level Semantics and Textual Inference</strong></p>

<p>SEQZERO: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models<br />
<em>Jingfeng Yang, Haoming Jiang, Qingyu Yin, Danqing Zhang, Bing Yin, Diyi Yang</em></p>

<p>SUBS: Subtree Substitution for Compositional Semantic Parsing<br />
<em>Jingfeng Yang, Le Zhang, Diyi Yang</em></p>
]]></body>
  <field_subtitle>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_subtitle>
  <field_dateline>
    <item>
      <value>2022-07-12T00:00:00-04:00</value>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_dateline>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Natural language processing (NLP) is a growing cornerstone of artificial intelligence and allows people and machines to act based on insights gleaned from digital text.]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_media>
          <item>
        <nid>
          <node id="659849">
            <nid>659849</nid>
            <type>image</type>
            <title><![CDATA[Diyi Yang]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>250096</fid>
                  <filename><![CDATA[Diyi_Yang.jpeg]]></filename>
                  <filepath><![CDATA[/sites/default/files/images/Diyi_Yang.jpeg]]></filepath>
                  <file_full_path><![CDATA[http://tlwarc.hg.gatech.edu//sites/default/files/images/Diyi_Yang.jpeg]]></file_full_path>
                  <filemime>image/jpeg</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
      </field_media>
  <field_contact_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_contact_email>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_contact>
    <item>
      <value><![CDATA[<p><a href="https://www.cc.gatech.edu/author/joshua-preston">JOSHUA PRESTON</a></p>
]]></value>
    </item>
  </field_contact>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <!--  TO DO: correct to not conflate categories and news room topics  -->
  <!--  Disquisition: it's funny how I write these TODOs and then never
         revisit them. It's as though the act of writing the thing down frees me
         from the responsibility to actually solve the problem. But what can I
         say? There are more problems than there's time to solve.  -->
  <links_related> </links_related>
  <files> </files>
  <og_groups>
          <item>545781</item>
      </og_groups>
  <og_groups_both>
      </og_groups_both>
  <field_categories>
      </field_categories>
  <core_research_areas>
          <term tid="39431"><![CDATA[Data Engineering and Science]]></term>
      </core_research_areas>
  <field_news_room_topics>
      </field_news_room_topics>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>545781</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Institute for Data Engineering and Science]]></item>
      </og_groups_both>
  <field_keywords>
          <item>
        <tid>187023</tid>
        <value><![CDATA[go-data]]></value>
      </item>
      </field_keywords>
  <field_userdata>
      <![CDATA[]]>
  </field_userdata>
</node>
