Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework

05/30/2019
by   MohammadReza Nazari, et al.
1

Although reinforcement learning (RL) can provide reliable solutions in many settings, practitioners are often wary of the discrepancies between the RL solution and their status quo procedures. Therefore, they may be reluctant to adapt to the novel way of executing tasks proposed by RL. On the other hand, many real-world problems require relatively small adjustments from the status quo policies to achieve improved performance. Therefore, we propose a student-teacher RL mechanism in which the RL (the "student") learns to maximize its reward, subject to a constraint that bounds the difference between the RL policy and the "teacher" policy. The teacher can be another RL policy (e.g., trained under a slightly different setting), the status quo policy, or any other exogenous policy. We formulate this problem using a stochastic optimization model and solve it using a primal-dual policy gradient algorithm. We prove that the policy is asymptotically optimal. However, a naive implementation suffers from high variance and convergence to a stochastic optimal policy. With a few practical adjustments to address these issues, our numerical experiments confirm the effectiveness of our proposed method in multiple GridWorld scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement...
research
11/15/2021

A teacher-student framework for online correctional learning

A classical learning setting is one in which a student collects data, or...
research
07/28/2017

Learning to Teach Reinforcement Learning Agents

In this article we study the transfer learning model of action advice un...
research
02/28/2020

Mixed Reinforcement Learning with Additive Stochastic Uncertainty

Reinforcement learning (RL) methods often rely on massive exploration da...
research
03/03/2023

Guarded Policy Optimization with Imperfect Online Demonstrations

The Teacher-Student Framework (TSF) is a reinforcement learning setting ...
research
09/18/2017

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

While bigger and deeper neural network architectures continue to advance...
research
08/29/2021

A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

Although well-established in general reinforcement learning (RL), value-...

Please sign up or login with your details

Forgot password? Click here to reset