Talk of Dr. Luiz Chamon

June 20, 2023

--- Title: (Reinforcement) Learning under Requirements

Time: June 20, 2023
Download as iCal:

Dr. Luiz Chamon
ELLIS-SimTech Independent Research Group Leader
University of Stuttgart
Stuttgart, Germany

 

Tuesday 2023-06-20 4:00 p.m.
IST Seminar Room 2.255 - Pfaffenwaldring 9 - Campus Stuttgart-Vaihingen

 

Abstract

The transformative power of learning lies in automating the design of complex systems. Today, however, learning does not incorporate requirements organically, which leads to data-driven solutions prone to tampering and unsafe behavior. In this talk, I will show when and how it is possible to learn under requirements by developing the theoretical underpinnings of constrained learning. For concreteness, I will start by considering the learning of safe policies in the reinforcement learning (RL) setting where we aim to control a Markov Decision Process (MDP) whose transition probabilities are unknown but from which we can sample trajectories. By safety, I mean the agent must remain in a safe state-space set with high probability during operation. We begin by transforming this problem into a constrained MDP that we show has small duality gap for rich policy parametrizations despite its non-convexity. This leads to a practical primal-dual algorithm that leverages traditional RL methods. I illustrate the performance of this method in a navigation problem. Despite its effectiveness, however, I will show that there are problems whose optimal policy cannot be obtained by linear combinations of rewards. Hence, not all constrained RL problems can be solved using regularized or primal-dual methods. Nevertheless, this shortcoming can be addressed by augmenting the state with Lagrange multipliers and reinterpreting dual updates as the dynamics that drive these multipliers' evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, while primal-dual methods can fail to find optimal policies, we show that this algorithm provably samples actions from the optimal policy.

 

Biographical Information

Luiz F. O. Chamon received the B.Sc. and M.Sc. degrees in electrical engineering from the University of São Paulo, São Paulo, Brazil, in 2011 and 2015 and the Ph.D. degree in electrical and systems engineering from the University of Pennsylvania (Penn), Philadelphia, in 2020. Until 2022, he was a postdoctoral fellow at the Simons Institute of the University of California, Berkeley. He is currently an independent research group leader at the University of Stuttgart, Germany. In 2009, he was an undergraduate exchange student of the Masters in Acoustics of the École Centrale de Lyon, Lyon, France, and worked as an Assistant Instructor and Consultant on nondestructive testing at INSACAST Formation Continue. From 2010 to 2014, he worked as a Signal Processing and Statistics Consultant on a research project with EMBRAER. He received both the best student paper and the best paper awards at IEEE ICASSP 2020 and was recognized by the IEEE Signal Processing Society for his distinguished work for the editorial board of the IEEE Transactions on Signal Processing in 2018. His research interests include optimization, signal processing, machine learning, statistics, and control.

 
To the top of the page