Learning Gaussian Policies from Corrective Human Feedback

03/12/2019
by   Daan Wout, et al.
0

Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher's learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in final performance, convergence rate and robustness to erroneous feedback in OpenAI Gym continuous control benchmarks, both for simulated and real human teachers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2019

Deep Reinforcement Learning with Feedback-based Exploration

Deep Reinforcement Learning has enabled the control of increasingly comp...
research
05/10/2022

Efficient Learning of Inverse Dynamics Models for Adaptive Computed Torque Control

Modelling robot dynamics accurately is essential for control, motion opt...
research
02/23/2020

Gaussian-Process-based Robot Learning from Demonstration

Endowed with higher levels of autonomy, robots are required to perform i...
research
11/12/2022

The Expertise Problem: Learning from Specialized Feedback

Reinforcement learning from human feedback (RLHF) is a powerful techniqu...
research
11/18/2019

Feedback Control for Online Training of Neural Networks

Convolutional neural networks (CNNs) are commonly used for image classif...
research
10/28/2021

Learning to Control using Image Feedback

Learning to control complex systems using non-traditional feedback, e.g....

Please sign up or login with your details

Forgot password? Click here to reset