Iterative Policy Learning for Constrained RL via Dissipative Gradient Descent-Ascent

In constrained reinforcement learning (C-RL), an agent aims to learn a policy that maximizes an expected cumulative reward (return) subject to satisfying additional requirements on secondary objectives. Recently, several algorithms—primarily rooted in stochastic (gradient) descent-ascent (SGDA) methods—have been proposed to solve this problem. While such algorithms can successfully provide rigorous regret bounds, the policy iterates that they compute fail to converge to the optimal policy. Though this problem is (informally) attributed to a fundamental limitation of constrained reinforcement learning, in this talk, we show that this limitation stems from the inability of SGDA algorithms to guarantee convergence for non-strongly monotone problems. To solve this problem, we propose a novel dissipative (stochastic) gradient descent-ascent algorithm (D-SGDA) for min-max problems that can guarantee convergence under mild assumptions on the convexity-concavity of the saddle function. We then show how to apply our D-SGDA algorithm to find optimal policies for tabular C-RL problems.

Enrique Mallada

Enrique Mallada is an Associate Professor of Electrical and Computer Engineering at Johns Hopkins University. His technical research interests lie in control and dynamical systems, network and graph theory, mathematical optimization, machine learning, and their applications to engineering systems. He completed his Ph.D. in Electrical and Computer Engineering at Cornell. Before joining Johns Hopkins, he was a CMI Fellow at Caltech.

Personal webpage