<node id="339611">
  <nid>339611</nid>
  <type>event</type>
  <uid>
    <user id="28077"><![CDATA[28077]]></user>
  </uid>
  <created>1415032176</created>
  <changed>1475892603</changed>
  <title><![CDATA[Ph.D. Defense by Arya Irani]]></title>
  <body><![CDATA[<p>Ph.D. Dissertation Defense Announcement<br /><br />Title: <strong>Utilizing Negative Policy Information to Accelerate Reinforcement Learning</strong><br /><br /><strong>Arya Irani</strong><br />School of Interactive Computing<br />College of Computing<br />Georgia Institute of Technology<br /><br />Date: Monday,&nbsp;November 10, 2014<br />Time: 12:30pm - 2:30pm EST<br />Location: CCB 345<br /><br />Committee:<br />Dr. Charles Isbell (Advisor; School of Interactive Computing, Georgia Institute of Technology)<br />Dr. Andrea Thomaz (School of Interactive Computing, Georgia Institute of Technology)<br />Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)<br />Dr. Karen Feigh (School of Aerospace Engineering, Georgia Institute of Technology)<br />Dr. Doina Precup (School of Computer Science, McGill University)<br /><br /><strong>Abstract:</strong><br /><br />A pilot study on Markov Decision Problem (MDP) task decomposition by humans revealed that participants would break down tasks into both short-term subgoals (with a defined end-condition), and long-term considerations and invariants (no end-condition). &nbsp;In the context of MDPs, behaviors having clear start and end conditions are well-modeled by options (Precup, 2000), but no abstraction exists in the literature for continuous requirements imposed on the agent's behavior. &nbsp;By modeling such policy restrictions and incorporating this information into an agent’s exploration, learning speedup can be achieved. &nbsp;Two proposed representations for such continuous requirements are the state constraint (a set or predicate identifying states that the agent should avoid), and the state-action constraint (identifying state-action pairs that should not be taken).<br /><br />We will demonstrate that the composition of options with constraints forms a powerful combination — a naïve option designed to perform well in a best-case scenario may still be used to benefit in domains where the best-case scenario is not guaranteed. &nbsp;This separation of concerns simplifies design and learning. &nbsp;We present the results of a study focusing on two classic video game inspired domains, in which participants with no AI experience construct and record examples of states to avoid; the examples are used to train predictors which implement a state constraint. &nbsp;We also demonstrate that constraints can in many cases be formulated by software engineers and given as modules to the RL system, eliminating one machine learning layer. &nbsp;We will discuss schemes for overcoming imperfectly defined constraints that would prevent an optimal policy, considerations in creating domain-appropriate schemes, as well as several future directions.</p>]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Utilizing Negative Policy Information to Accelerate Reinforcement Learning]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2014-11-10T11:30:00-05:00]]></value>
      <value2><![CDATA[2014-11-10T13:30:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>106731</tid>
        <value><![CDATA[PhD Defense; graduate students]]></value>
      </item>
      </field_keywords>
  <userdata><![CDATA[]]></userdata>
</node>
