{"339611":{"#nid":"339611","#data":{"type":"event","title":"Ph.D. Defense by Arya Irani","body":[{"value":"\u003Cp\u003EPh.D. Dissertation Defense Announcement\u003Cbr \/\u003E\u003Cbr \/\u003ETitle: \u003Cstrong\u003EUtilizing Negative Policy Information to Accelerate Reinforcement Learning\u003C\/strong\u003E\u003Cbr \/\u003E\u003Cbr \/\u003E\u003Cstrong\u003EArya Irani\u003C\/strong\u003E\u003Cbr \/\u003ESchool of Interactive Computing\u003Cbr \/\u003ECollege of Computing\u003Cbr \/\u003EGeorgia Institute of Technology\u003Cbr \/\u003E\u003Cbr \/\u003EDate: Monday,\u0026nbsp;November 10, 2014\u003Cbr \/\u003ETime: 12:30pm - 2:30pm EST\u003Cbr \/\u003ELocation: CCB 345\u003Cbr \/\u003E\u003Cbr \/\u003ECommittee:\u003Cbr \/\u003EDr. Charles Isbell (Advisor; School of Interactive Computing, Georgia Institute of Technology)\u003Cbr \/\u003EDr. Andrea Thomaz (School of Interactive Computing, Georgia Institute of Technology)\u003Cbr \/\u003EDr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)\u003Cbr \/\u003EDr. Karen Feigh (School of Aerospace Engineering, Georgia Institute of Technology)\u003Cbr \/\u003EDr. Doina Precup (School of Computer Science, McGill University)\u003Cbr \/\u003E\u003Cbr \/\u003E\u003Cstrong\u003EAbstract:\u003C\/strong\u003E\u003Cbr \/\u003E\u003Cbr \/\u003EA pilot study on Markov Decision Problem (MDP) task decomposition by humans revealed that participants would break down tasks into both short-term subgoals (with a defined end-condition), and long-term considerations and invariants (no end-condition). \u0026nbsp;In the context of MDPs, behaviors having clear start and end conditions are well-modeled by options (Precup, 2000), but no abstraction exists in the literature for continuous requirements imposed on the agent\u0027s behavior. \u0026nbsp;By modeling such policy restrictions and incorporating this information into an agent\u2019s exploration, learning speedup can be achieved. \u0026nbsp;Two proposed representations for such continuous requirements are the state constraint (a set or predicate identifying states that the agent should avoid), and the state-action constraint (identifying state-action pairs that should not be taken).\u003Cbr \/\u003E\u003Cbr \/\u003EWe will demonstrate that the composition of options with constraints forms a powerful combination \u2014 a na\u00efve option designed to perform well in a best-case scenario may still be used to benefit in domains where the best-case scenario is not guaranteed. \u0026nbsp;This separation of concerns simplifies design and learning. \u0026nbsp;We present the results of a study focusing on two classic video game inspired domains, in which participants with no AI experience construct and record examples of states to avoid; the examples are used to train predictors which implement a state constraint. \u0026nbsp;We also demonstrate that constraints can in many cases be formulated by software engineers and given as modules to the RL system, eliminating one machine learning layer. \u0026nbsp;We will discuss schemes for overcoming imperfectly defined constraints that would prevent an optimal policy, considerations in creating domain-appropriate schemes, as well as several future directions.\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Utilizing Negative Policy Information to Accelerate Reinforcement Learning"}],"uid":"28077","created_gmt":"2014-11-03 16:29:36","changed_gmt":"2016-10-08 02:10:03","author":"Danielle Ramirez","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2014-11-10T11:30:00-05:00","event_time_end":"2014-11-10T13:30:00-05:00","event_time_end_last":"2014-11-10T13:30:00-05:00","gmt_time_start":"2014-11-10 16:30:00","gmt_time_end":"2014-11-10 18:30:00","gmt_time_end_last":"2014-11-10 18:30:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"106731","name":"PhD Defense; graduate students"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}