A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret

09/20/2022
by   Sheelabhadra Dey, et al.
6

In various control task domains, existing controllers provide a baseline level of performance that – though possibly suboptimal – should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret w.r.t a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind (a) leveraging the baseline's online demonstrations to minimize the regret w.r.t the baseline policy during training, and (b) eventually surpassing the baseline performance. JIRL addresses these objectives by initially learning to imitate the baseline policy and gradually shifting control from the baseline to an RL agent. Experimental results show that JIRL effectively accomplishes the aforementioned objectives in several, continuous action-space domains. The results demonstrate that JIRL is comparable to a state-of-the-art algorithm in its final performance while incurring significantly lower baseline regret during training in all of the presented domains. Moreover, the results show a reduction factor of up to 21 in baseline regret over a state-of-the-art baseline regret minimization approach.

READ FULL TEXT

page 1

page 4

research
07/08/2019

On-Policy Robot Imitation Learning from a Converging Supervisor

Existing on-policy imitation learning algorithms, such as DAgger, assume...
research
11/02/2020

NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control

Traditionally, reinforcement learning methods predict the next action ba...
research
05/05/2013

Regret Bounds for Reinforcement Learning with Policy Advice

In some reinforcement learning problems an agent may be provided with a ...
research
12/14/2022

Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where ...
research
11/14/2019

A Reduction from Reinforcement Learning to No-Regret Online Learning

We present a reduction from reinforcement learning (RL) to no-regret onl...
research
03/04/2021

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

Reinforcement Learning (RL) has been able to solve hard problems such as...
research
07/13/2016

Safe Policy Improvement by Minimizing Robust Baseline Regret

An important problem in sequential decision-making under uncertainty is ...

Please sign up or login with your details

Forgot password? Click here to reset