Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

05/05/2023
by   Patrick Emedom-Nnamdi, et al.
6

Standard approaches to sequential decision-making exploit an agent's ability to continually interact with its environment and improve its control policy. However, due to safety, ethical, and practicality constraints, this type of trial-and-error experimentation is often infeasible in many real-world domains such as healthcare and robotics. Instead, control policies in these domains are typically trained offline from previously logged data or in a growing-batch manner. In this setting a fixed policy is deployed to the environment and used to gather an entire batch of new data before being aggregated with past batches and used to update the policy. This improvement cycle can then be repeated multiple times. While a limited number of such cycles is feasible in real-world domains, the quantity and diversity of the resulting data are much lower than in the standard continually-interacting approach. However, data collection in these domains is often performed in conjunction with human experts, who are able to label or annotate the collected data. In this paper, we first explore the trade-offs present in this growing-batch setting, and then investigate how information provided by a teacher (i.e., demonstrations, expert actions, and gradient information) can be leveraged at training time to mitigate the sample complexity and coverage requirements for actor-critic methods. We validate our contributions on tasks from the DeepMind Control Suite.

READ FULL TEXT
research
12/13/2018

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been success...
research
06/03/2019

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

Off-policy reinforcement learning aims to leverage experience collected ...
research
06/30/2019

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

Most deep reinforcement learning (RL) systems are not able to learn effe...
research
02/19/2020

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Off-policy reinforcement learning algorithms promise to be applicable in...
research
11/24/2020

REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning

Accelerating the learning processes for complex tasks by leveraging prev...
research
03/20/2019

Batch Policy Learning under Constraints

When learning policies for real-world domains, two important questions a...
research
10/05/2020

Offline Learning for Planning: A Summary

The training of autonomous agents often requires expensive and unsafe tr...

Please sign up or login with your details

Forgot password? Click here to reset