Reinforcement Learning with a Terminator

05/30/2022
by   Guy Tennenholtz, et al.
0

We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret. Motivated by our theoretical analysis, we design and implement a scalable approach, which combines optimism (w.r.t. termination) and a dynamic discount factor, incorporating the termination probability. We deploy our method on high-dimensional driving and MinAtar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches.

READ FULL TEXT
research
08/27/2021

WAD: A Deep Reinforcement Learning Agent for Urban Autonomous Driving

Urban autonomous driving is an open and challenging problem to solve as ...
research
11/29/2022

Discrete Control in Real-World Driving Environments using Deep Reinforcement Learning

Training self-driving cars is often challenging since they require a vas...
research
06/21/2023

State-wise Constrained Policy Optimization

Reinforcement Learning (RL) algorithms have shown tremendous success in ...
research
02/11/2019

Performance Dynamics and Termination Errors in Reinforcement Learning: A Unifying Perspective

In reinforcement learning, a decision needs to be made at some point as ...
research
08/24/2023

Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward

Robot control using reinforcement learning has become popular, but its l...
research
03/03/2019

Scaling up budgeted reinforcement learning

Can we learn a control policy able to adapt its behaviour in real time s...
research
09/30/2021

Surveillance Evasion Through Bayesian Reinforcement Learning

We consider a 2D continuous path planning problem with a completely unkn...

Please sign up or login with your details

Forgot password? Click here to reset