<![CDATA[PhD Defense by Sihan Zeng]]>

667274 event 1681312606 1681312606 <![CDATA[PhD Defense by Sihan Zeng]]> Title: Designing Policy Optimization Algorithms For Multi-Agent Reinforcement Learning

Date: Monday, April 24th

Time: 10:00am - 11:00am Eastern Time

Teams link: https://teams.microsoft.com/l/meetup-join/19%3ameeting_MzQ5NDE2ZWItMDIxZi00Yzk0LWJiODEtMmM1MzY4ZThmMTky%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%220e0acf27-edc3-43d4-8ef1-6945305e20e6%22%7d

Sihan Zeng

Machine Learning PhD Student

School of Electrical and Computer Engineering

Georgia Institute of Technology

Committee

1 Dr. Justin Romberg (Advisor)

2 Dr. Siva Theja Maguluri

3 Dr. Guanghui Lan

4 Dr. Thinh T. Doan

5 Dr. Daniel Molzahn

Abstract

The overall objective of the thesis is to enhance the understanding of structure in multi-agent reinforcement learning (RL) and to build reliable and efficient algorithms that exploit and/or respect the structure. First, we present a unified two-time-scale stochastic optimization framework under a special type of gradient oracle that abstracts a range of data-driven algorithms in RL. Targeting single-agent RL problems, this framework builds the mathematical foundation for designing and analyzing data-driven multi-agent RL algorithms. Second, we discuss the challenge and structure of multi-agent RL in multi-task cooperative and two-player competitive settings and leverage the structure to design provably convergent and efficient algorithms. In the final aim, we apply multi-agent RL to solve power system optimization problems. Specifically, we develop a RL-based penalty parameter selection method for the alternating current optimal power flow (ACOPF) problem solved via ADMM, with the goal of minimizing the number of iterations until convergence. Our method leads to significantly accelerated ADMM convergence compared to the state-of-the-art hand-designed parameter selection schemes and exhibits superior generalizability.

]]> See below

]]> <![CDATA[]]> 221981 1788 100811