{"670919":{"#nid":"670919","#data":{"type":"event","title":"PhD Defense by Sajad Khodadadian","body":[{"value":"\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003EThesis Title: \u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003ESample Complexity of Reinforcement Learning Algorithms with a Focus on Policy Space Methods\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003EThesis Committee:\u003C\/span\u003E\u003C\/strong\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EDr. Siva Theja Maguluri (Advisor), \u003Cspan\u003EIndustrial and Systems Engineering, Georgia Institute of Technology\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EDr. Guanghui (George)\u0026nbsp;Lan,\u0026nbsp;Industrial and Systems Engineering, Georgia Institute of Technology\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EDr. Ashwin Pananjady, Industrial and Systems Engineering, Georgia Institute of Technology\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EDr. Justin Romberg, Electrical and Computer Engineering, Georgia Institute of Technology\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EDr. Daniel Russo, Columbia Business School, Columbia University\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003EDate and Time: \u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003EMonday,\u0026nbsp;Nov 20th, 2 pm (EST)\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u0026nbsp;\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003EIn-Person Location\u003C\/span\u003E\u003C\/strong\u003E\u003Cstrong\u003E\u003Cspan\u003E\u003Cspan\u003E: \u003C\/span\u003E\u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003EISyE Executive Boardroom (Groseclose 402)\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003EMeeting Link: \u003C\/span\u003E\u003C\/strong\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Ca href=\u0022https:\/\/gatech.zoom.us\/j\/93338819620\u0022\u003Ehttps:\/\/gatech.zoom.us\/j\/93338819620\u003C\/a\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cstrong\u003E\u003Cspan\u003EAbstract:\u003C\/span\u003E\u003C\/strong\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EIn this thesis, we develop fast reinforcement learning algorithms with finite sample complexity guarantees. The work is divided into two main parts. In the first, we investigate stochastic approximation across various domains to establish finite sample complexity bounds. We study two settings: federated stochastic approximation and two-time-scale linear stochastic approximation with Markovian noise. In the former, we develop a FedSAM algorithm where multiple agents are utilized to solve a fixed-point equation, following a stochastic approximation with Markovian noise. Moreover, we show that FedSAM has linear speedup with respect to the number of agents, while enjoying a constant communication cost. In the latter, we explore two-time-scale linear stochastic approximation with Markovian noise, establishing tight finite-time bounds.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003EThe second part delves into finite-time bounds for reinforcement learning algorithms, with an emphasis on policy space methods. First, we consider two-time-scale natural actor-critic algorithm with on-policy data. For this algorithm we establish a $\\epsilon^{-6}$ sample complexity for convergence to the global optimum. Next, we study two-loop natural actor-critic, and we establish a $\\epsilon^{-3}$ sample complexity, improving upon the two-time-scale counterpart. In this case, we consider an off-policy sampling strategy.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003ETo enhance the sample complexity of the natural actor-critic, we separate the algorithm into \u0027Actor\u0027 and \u0027Critic\u0027 components. For the Critic, we consider federated TD-learning and TD-learning with Polyak averaging. For the former, we show a linear speedup, and in the latter we establish a tight finite time bound. Furthermore, we establish a tight finite time convergence bound for the TDC algorithm. For the Actor, we demonstrate linear and superlinear convergence rates for the natural policy gradient.\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003E\u003Cspan\u003E\u003Cspan\u003E\u003Cspan\u003ESample Complexity of Reinforcement Learning Algorithms with a Focus on Policy Space Methods\u003C\/span\u003E\u003C\/span\u003E\u003C\/span\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"Sample Complexity of Reinforcement Learning Algorithms with a Focus on Policy Space Methods"}],"uid":"27707","created_gmt":"2023-11-07 14:01:04","changed_gmt":"2023-11-07 14:01:04","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2023-11-20T02:00:00-05:00","event_time_end":"2023-11-20T04:00:00-05:00","event_time_end_last":"2023-11-20T04:00:00-05:00","gmt_time_start":"2023-11-20 07:00:00","gmt_time_end":"2023-11-20 09:00:00","gmt_time_end_last":"2023-11-20 09:00:00","rrule":null,"timezone":"America\/New_York"},"location":"ISyE Executive Boardroom (Groseclose 402)","extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}