Guarded Policy Optimization with Imperfect Online Demonstrations

03/03/2023
by   Zhenghai Xue, et al.
0

The Teacher-Student Framework (TSF) is a reinforcement learning setting where a teacher agent guards the training of a student agent by intervening and providing online demonstrations. Assuming optimal, the teacher policy has the perfect timing and capability to intervene in the learning process of the student agent, providing safety guarantee and exploration guidance. Nevertheless, in many real-world settings it is expensive or even impossible to obtain a well-performing teacher policy. In this work, we relax the assumption of a well-performing teacher and develop a new method that can incorporate arbitrary teacher policies with modest or inferior performance. We instantiate an Off-Policy Reinforcement Learning algorithm, termed Teacher-Student Shared Control (TS2C), which incorporates teacher intervention based on trajectory-based value estimation. Theoretical analysis validates that the proposed TS2C algorithm attains efficient exploration and substantial safety guarantee without being affected by the teacher's own performance. Experiments on various continuous control tasks show that our method can exploit teacher policies at different performance levels while maintaining a low training cost. Moreover, the student policy surpasses the imperfect teacher policy in terms of higher accumulated reward in held-out testing environments. Code is available at https://metadriverse.github.io/TS2C.

READ FULL TEXT

page 7

page 8

page 9

page 17

page 19

research
11/15/2021

A teacher-student framework for online correctional learning

A classical learning setting is one in which a student collects data, or...
research
02/07/2020

Student/Teacher Advising through Reward Augmentation

Transfer learning is an important new subfield of multiagent reinforceme...
research
08/29/2023

Policy composition in reinforcement learning via multi-objective policy optimization

We enable reinforcement learning agents to learn successful behavior pol...
research
04/04/2023

Optimal Transport for Correctional Learning

The contribution of this paper is a generalized formulation of correctio...
research
07/06/2023

TGRL: An Algorithm for Teacher Guided Reinforcement Learning

Learning from rewards (i.e., reinforcement learning or RL) and learning ...
research
09/18/2017

N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

While bigger and deeper neural network architectures continue to advance...
research
05/30/2019

Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework

Although reinforcement learning (RL) can provide reliable solutions in m...

Please sign up or login with your details

Forgot password? Click here to reset