Constrained Reinforcement Learning Has Zero Duality Gap

10/29/2019
by   Santiago Paternain, et al.
0

Autonomous agents must often deal with conflicting requirements, such as completing tasks using the least amount of time/energy, learning multiple tasks, or dealing with multiple opponents. In the context of reinforcement learning (RL), these problems are addressed by (i) designing a reward function that simultaneously describes all requirements or (ii) combining modular value functions that encode them individually. Though effective, these methods have critical downsides. Designing good reward functions that balance different objectives is challenging, especially as the number of objectives grows. Moreover, implicit interference between goals may lead to performance plateaus as they compete for resources, particularly when training on-policy. Similarly, selecting parameters to combine value functions is at least as hard as designing an all-encompassing reward, given that the effect of their values on the overall policy is not straightforward. The later is generally addressed by formulating the conflicting requirements as a constrained RL problem and solved using Primal-Dual methods. These algorithms are in general not guaranteed to converge to the optimal solution since the problem is not convex. This work provides theoretical support to these approaches by establishing that despite its non-convexity, this problem has zero duality gap, i.e., it can be solved exactly in the dual domain, where it becomes convex. Finally, we show this result basically holds if the policy is described by a good parametrization (e.g., neural networks) and we connect this result with primal-dual algorithms present in the literature and we establish the convergence to the optimal solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

We consider primal-dual-based reinforcement learning (RL) in episodic co...
research
06/24/2021

Density Constrained Reinforcement Learning

We study constrained reinforcement learning (CRL) from a novel perspecti...
research
06/01/2021

Reward is enough for convex MDPs

Maximising a cumulative reward function that is Markov and stationary, i...
research
06/13/2023

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Offline constrained reinforcement learning (RL) aims to learn a policy t...
research
03/05/2023

Bounding the Optimal Value Function in Compositional Reinforcement Learning

In the field of reinforcement learning (RL), agents are often tasked wit...
research
02/23/2021

State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards

Constrained reinforcement learning involves multiple rewards that must i...
research
06/23/2022

Risk-Constrained Nonconvex Dynamic Resource Allocation has Zero Duality Gap

We show that risk-constrained dynamic resource allocation problems with ...

Please sign up or login with your details

Forgot password? Click here to reset