Concave Utility Reinforcement Learning with Zero-Constraint Violations

09/12/2021
by   Mridul Agarwal, et al.
0

We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. Various learning applications with constraints, such as robotics, do not allow for policies that can violate constraints. To this end, we propose a model-based learning algorithm that achieves zero constraint violations. To obtain this result, we assume that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures. We then solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We also propose a novel Bellman error based analysis for tabular infinite-horizon setups which allows to analyse stochastic policies. Combining the Bellman error based analysis and tighter optimization equation, for T interactions with the environment, we obtain a regret guarantee for objective which grows as O(1/√(T)), excluding other factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Constrained episodic reinforcement learning in concave-convex and knapsack settings

We propose an algorithm for tabular episodic reinforcement learning with...
research
02/16/2018

Online Continuous Submodular Maximization

In this paper, we consider an online optimization process, where the obj...
research
06/21/2018

Online Saddle Point Problem with Applications to Constrained Online Convex Optimization

We study an online saddle point problem where at each iteration a pair o...
research
09/08/2021

Learning Zero-sum Stochastic Games with Posterior Sampling

In this paper, we propose Posterior Sampling Reinforcement Learning for ...
research
01/13/2023

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret perfo...
research
05/16/2022

Efficient Algorithms for Planning with Participation Constraints

We consider the problem of planning with participation constraints intro...
research
04/19/2021

Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls

We study finite-time horizon continuous-time linear-convex reinforcement...

Please sign up or login with your details

Forgot password? Click here to reset