{"666962":{"#nid":"666962","#data":{"type":"event","title":"CSIP Seminar: Two-Agent Competitive Reinforcement Learning","body":[{"value":"\u003Ch3\u003E\u003Cstrong\u003ECenter for Signals and Information Processing (CSIP)\u0026nbsp;Seminar\u003C\/strong\u003E\u003C\/h3\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EDate:\u003C\/strong\u003E\u0026nbsp;Friday, April 7,\u0026nbsp;2023\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ETime:\u003C\/strong\u003E\u0026nbsp;3:00 p.m. - 4:00 p.m. EST\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ELocation:\u0026nbsp;\u003C\/strong\u003ECentergy Building 5126.\u0026nbsp;The associated zoom link is:\u0026nbsp;\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/98339968295\u0022 target=\u0022_blank\u0022 title=\u0022https:\/\/gatech.zoom.us\/j\/98339968295\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/98339968295\u003C\/a\u003E.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESpeaker:\u0026nbsp;\u003C\/strong\u003ESihan Zeng\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESpeakers\u0027 Title:\u003C\/strong\u003E\u0026nbsp;Two-Agent Competitive Reinforcement Learning\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESeminar Title:\u0026nbsp;\u003C\/strong\u003EPh.D. student at Georgia Tech working with Dr. Justin Romberg\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EAbstract:\u0026nbsp;\u003C\/strong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EMulti-agent reinforcement learning studies the sequential decision making problem in the scenario where multiple agents co-exist in the same environment and jointly determine the environment transition and\/or reward function. In this talk we consider two specific multi-agent settings and discuss the structure of the underlying optimization problems.\u0026nbsp;\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EThe first setting is the two-player zero-sum Markov game, in which one agent maximizes the same cumulative reward that the other agent seeks to minimize. Usually formulated as a nonconvex-nonconcave minimax optimization program, this problem is notoriously hard to solve with direct policy optimization algorithms. Our approach to the challenge is to introduce strong structure to the Markov game through an entropy regularization. We apply direct gradient descent ascent on the regularized objective and propose schemes of adjusting the regularization weight to make the algorithm converge to a global solution of the original unregularized problem. The convergence rate of the proposed algorithm vastly improves over the existing convergence bounds for gradient descent ascent algorithms.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EIn the second part of the talk, we start by presenting an equiconnectedness property of the objective function in the single-agent policy optimization problem, both in the tabular setting and under policies parameterized by a sufficiently large neural network. As a consequence of the property, we derive a minimax theorem for a robust reinforcement learning problem, where the learning agent defends against an adversary that attacks its reward function. This is the first time such a result is established in the literature. We conclude by pointing out a few ways to extend our work in both directions.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESpeaker Bio:\u003C\/strong\u003E\u0026nbsp;\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003ESihan Zeng is a final-year PhD student at Georgia Tech, working with Dr. Justin Romberg. His research interests lie in reinforcement learning, optimization, and applied probability. He received the B.S. in Electrical Engineering and B.A. in Statistics in 2017 from Rice University in Houston, Texas.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ESihan Zeng,\u0026nbsp;a final-year PhD student at Georgia Tech working with Dr. Justin Romberg,\u0026nbsp;will present the April 7 CSIP Seminar, \u0022Two-Agent Competitive Reinforcement Learning.\u0022\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"Featuring Sihan Zeng, Ph.D. candidate at Georgia Tech"}],"uid":"36172","created_gmt":"2023-03-31 12:36:33","changed_gmt":"2023-03-31 12:42:48","author":"dwatson71","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2023-04-07T15:00:00-04:00","event_time_end":"2023-04-07T16:00:00-04:00","event_time_end_last":"2023-04-07T16:00:00-04:00","gmt_time_start":"2023-04-07 19:00:00","gmt_time_end":"2023-04-07 20:00:00","gmt_time_end_last":"2023-04-07 20:00:00","rrule":null,"timezone":"America\/New_York"},"location":"Centergy Building 5126","extras":[],"groups":[{"id":"1255","name":"School of Electrical and Computer Engineering"}],"categories":[],"keywords":[{"id":"192224","name":"CSIP Seminar"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1795","name":"Seminar\/Lecture\/Colloquium"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EKiran Kokilepersaud\u003Cbr \/\u003E\r\n\u003Ca href=\u0022mailto:kpk6@gatech.edu\u0022\u003Ekpk6@gatech.edu\u003C\/a\u003E\u003Cbr \/\u003E\r\n\u0026nbsp;\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}