<node id="667567">
  <nid>667567</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1682712774</created>
  <changed>1682712774</changed>
  <title><![CDATA[PhD Proposal by Renzhi Wu]]></title>
  <body><![CDATA[<p><span><span><span><strong><span><span><span>Title</span></span></span></strong><span><span><span>: User-Centered Programmatic Data Labeling </span></span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><span><strong><span><span><span>Date</span></span></span></strong><span><span><span>: Tuesday, May 2, 2023</span></span></span></span></span></span></p>

<p><span><span><span><strong><span><span><span>Time</span></span></span></strong><span><span><span>: 14:30 – 16:30 EST</span></span></span></span></span></span></p>

<p><span><span><span><strong><span><span><span>Location</span></span></span></strong><span><span><span>: </span></span></span><span><span><span><a href="https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZWQyYWM1MDQtMmNjMC00YTY3LWJkZTAtZTRkYjMzNTI0Nzg2%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%221cf11de8-43de-4907-9170-51234b828efc%22%7d" title="https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZWQyYWM1MDQtMmNjMC00YTY3LWJkZTAtZTRkYjMzNTI0Nzg2%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%221cf11de8-43de-4907-9170-51234b828efc%22%7d">Teams Link</a></span></span></span></span></span></span></p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<p><span><span><span><strong><span><span><span>Renzhi Wu</span></span></span></strong></span></span></span></p>

<p><span><span><span><span><span>Ph.D. Student in Computer Science</span></span></span></span></span></p>

<p><span><span><span><span><span>School of Computer Science</span></span></span></span></span></p>

<p><span><span><span><span><span>College of Computing</span></span></span></span></span></p>

<p><span><span><span><span><span>Georgia Institute of Technology</span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><span><strong><span><span><span>Committee</span></span></span></strong><span><span><span>:&nbsp;</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr. Xu Chu (Advisor) – </span></span></span><span><span><span>School of Computer Science</span></span></span><span><span><span>, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr. Joy Arulraj – </span></span></span><span><span><span>School of Computer Science</span></span></span><span><span><span>, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr. Kexin Rong – </span></span></span><span><span><span>School of Computer Science</span></span></span><span><span><span>, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr. Shamkant Navathe – </span></span></span><span><span><span>School of Computer Science</span></span></span><span><span><span>, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p><span><span><span><span><span><span>Dr. Chao Zhang </span></span></span><span><span><span>–&nbsp;</span></span></span><span><span><span>&nbsp;School of Computational Science and Engineering, Georgia Institute of Technology</span></span></span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><span><strong><span><span><span>Abstract</span></span></span></strong><span><span><span>: </span></span></span></span></span></span></p>

<p><span><span><span><span>The lack of labeled training data is a major challenge impeding the practical application of machine learning (ML) techniques. Therefore, ML practitioners have increasingly turned to programmatic supervision methods, in which a larger volume of programmatically generated, but often noisier, labeled examples is used in lieu of hand-labeled examples. In this paradigm, supervision sources are expressed as labeling functions (LFs), and a label model aggregates the output of multiple LFs to produce training labels.&nbsp; However, existing methods provide little support for writing LFs, which can be difficult for common users, especially on non-texture data. In addition, existing label models require hyperparameters and dataset-specific training for each dataset and can yield non-deterministic results, further complicating the process for non-expert users.</span></span></span></span></p>

<p>&nbsp;</p>

<p><span><span><span><span>This thesis aims to improve the usability of programmatic data labeling through a three-part research approach. First, I examine a specific task (entity matching) as a case study to develop an integrated development environment (IDE) to support users to write, manage, and aggregate LFs. On top of this, I also explore ways to tailor programmatic data labeling to the specific task for better performance. Second, to obviate user involvement in the label model, I present a hyper label model that requires neither hyperparameters nor dataset-specific training, while producing deterministic results with superior accuracy and efficiency. The proposed method also offers the first analytical optimal solution to the problem. Third, I extend the labeling function interface by introducing a visual interface, allowing users to create LFs for video data intuitively without any coding. Specifically, I propose a visual query language for retrieving video clips across datasets, enabling non-expert users to easily develop LFs with mouse drag-and-drop. </span></span></span></span></p>

<p>&nbsp;</p>

<p>&nbsp;</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[User-Centered Programmatic Data Labeling ]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p><span><span><span>User-Centered Programmatic Data Labeling </span></span></span></p>
]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2023-05-02T14:30:00-04:00]]></value>
      <value2><![CDATA[2023-05-02T16:30:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[TEAMS]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>102851</tid>
        <value><![CDATA[Phd proposal]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
