Learning Preferences for Manipulation Tasks from Online Coactive Feedback

01/05/2016
by   Ashesh Jain, et al.
0

We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they are rather governed by the surrounding context of various objects and human interactions in the environment. We propose a coactive online learning framework for teaching preferences in contextually rich environments. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited than demonstrations of optimal trajectories. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We implement our algorithm on two high degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings -- household chores and grocery store checkout -- and show that users are able to train the robot with just a few feedbacks (taking only a few minutes).[Parts of this work has been published at NIPS and ISRR conferences Jain13,Jain13b. This journal submission presents a consistent full paper, and also includes the proof of regret bounds, more details of the robotic system, and a thorough related work.]

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 7

page 9

page 13

research
06/26/2013

Learning Trajectory Preferences for Manipulators via Iterative Improvement

We consider the problem of learning good trajectories for manipulation t...
research
01/25/2023

An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Human Preferences

Humans often demonstrate diverse behaviors due to their personal prefere...
research
06/10/2014

PlanIt: A Crowdsourcing Approach for Learning to Plan Paths from Large Scale Preference Feedback

We consider the problem of learning user preferences over robot trajecto...
research
10/01/2021

Learning Reward Functions from Scale Feedback

Today's robots are increasingly interacting with people and need to effi...
research
04/02/2021

Learning Online from Corrective Feedback: A Meta-Algorithm for Robotics

A key challenge in Imitation Learning (IL) is that optimal state actions...
research
11/11/2021

Towards Transferring Human Preferences from Canonical to Actual Assembly Tasks

To assist human users according to their individual preference in assemb...
research
12/15/2021

Learning Submodular Objectives for Team Environmental Monitoring

In this paper, we study the well-known team orienteering problem where a...

Please sign up or login with your details

Forgot password? Click here to reset