Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics

04/02/2021
by   Matthew Schmittle, et al.
0

A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide. For example in robotics, providing kinesthetic demonstrations on a robotic manipulator requires the teacher to control multiple degrees of freedom at once. The difficulty of requiring optimal state action demonstrations limits the space of problems where the teacher can provide quality feedback. As an alternative to state action demonstrations, the teacher can provide corrective feedback such as their preferences or rewards. Prior work has created algorithms designed to learn from specific types of noisy feedback, but across teachers and tasks different forms of feedback may be required. Instead we propose that in order to learn from a diversity of scenarios we need to learn from a variety of feedback. To learn from a variety of feedback we make the following insight: the teacher's cost function is latent and we can model a stream of feedback as a stream of loss functions. We then use any online learning algorithm to minimize the sum of these losses. With this insight we can learn from a diversity of feedback that is weakly correlated with the teacher's true cost function. We unify prior work into a general corrective feedback meta-algorithm and show that regardless of feedback we can obtain the same regret bounds. We demonstrate our approach by learning to perform a household navigation task on a robotic racecar platform. Our results show that our approach can learn quickly from a variety of noisy feedback.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

06/07/2019

Watch, Try, Learn: Meta-Learning from Demonstrations and Reward

Imitation learning allows agents to learn complex behaviors from demonst...
01/05/2016

Learning Preferences for Manipulation Tasks from Online Coactive Feedback

We consider the problem of learning preferences over trajectories for mo...
02/25/2020

Metric-Based Imitation Learning Between Two Dissimilar Anthropomorphic Robotic Arms

The development of autonomous robotic systems that can learn from human ...
06/14/2020

Active Imitation Learning from Multiple Non-Deterministic Teachers: Formulation, Challenges, and Algorithms

We formulate the problem of learning to imitate multiple, non-determinis...
06/26/2013

Learning Trajectory Preferences for Manipulators via Iterative Improvement

We consider the problem of learning good trajectories for manipulation t...
08/25/2019

Combined Task and Action Learning from Human Demonstrations for Mobile Manipulation Applications

Learning from demonstrations is a promising paradigm for transferring kn...
07/11/2019

Self-Regulated Interactive Sequence-to-Sequence Learning

Not all types of supervision signals are created equal: Different types ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.