Scaling up budgeted reinforcement learning

03/03/2019
by   Nicolas Carrara, et al.
0

Can we learn a control policy able to adapt its behaviour in real time so as to take any desired amount of risk? The general Reinforcement Learning framework solely aims at optimising a total reward in expectation, which may not be desirable in critical applications. In stark contrast, the Budgeted Markov Decision Process (BMDP) framework is a formalism in which the notion of risk is implemented as a hard constraint on a failure signal. Existing algorithms solving BMDPs rely on strong assumptions and have so far only been applied to toy-examples. In this work, we relax some of these assumptions and demonstrate the scalability of our approach on two practical problems: a spoken dialogue system and an autonomous driving task. On both examples, we reach similar performances as Lagrangian Relaxation methods with a significant improvement in sample and memory efficiency.

READ FULL TEXT
research
03/29/2017

Inverse Risk-Sensitive Reinforcement Learning

We address the problem of inverse reinforcement learning in Markov decis...
research
06/21/2023

State-wise Constrained Policy Optimization

Reinforcement Learning (RL) algorithms have shown tremendous success in ...
research
02/25/2023

On Bellman's principle of optimality and Reinforcement learning for safety-constrained Markov decision process

We study optimality for the safety-constrained Markov decision process w...
research
04/02/2020

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Although in recent years reinforcement learning has become very popular ...
research
03/28/2022

REPTILE: A Proactive Real-Time Deep Reinforcement Learning Self-adaptive Framework

In this work a general framework is proposed to support the development ...
research
02/12/2019

Value constrained model-free continuous control

The naive application of Reinforcement Learning algorithms to continuous...
research
05/30/2022

Reinforcement Learning with a Terminator

We present the problem of reinforcement learning with exogenous terminat...

Please sign up or login with your details

Forgot password? Click here to reset