DeepAI AI Chat
Log In Sign Up

PlanIt: A Crowdsourcing Approach for Learning to Plan Paths from Large Scale Preference Feedback

by   Ashesh Jain, et al.

We consider the problem of learning user preferences over robot trajectories for environments rich in objects and humans. This is challenging because the criterion defining a good trajectory varies with users, tasks and interactions in the environment. We represent trajectory preferences using a cost function that the robot learns and uses it to generate good trajectories in new environments. We design a crowdsourcing system - PlanIt, where non-expert users label segments of the robot's trajectory. PlanIt allows us to collect a large amount of user feedback, and using the weak and noisy labels from PlanIt we learn the parameters of our model. We test our approach on 122 different environments for robotic navigation and manipulation tasks. Our extensive experiments show that the learned cost function generates preferred trajectories in human environments. Our crowdsourcing system is publicly available for the visualization of the learned costs and for providing preference feedback: <>


page 1

page 3

page 4

page 5

page 7


Learning Trajectory Preferences for Manipulators via Iterative Improvement

We consider the problem of learning good trajectories for manipulation t...

Learning User Preferences for Trajectories from Brain Signals

Robot motions in the presence of humans should not only be feasible and ...

Learning Preferences for Manipulation Tasks from Online Coactive Feedback

We consider the problem of learning preferences over trajectories for mo...

An Incremental Inverse Reinforcement Learning Approach for Motion Planning with Human Preferences

Humans often demonstrate diverse behaviors due to their personal prefere...

Learning Reward Functions from Scale Feedback

Today's robots are increasingly interacting with people and need to effi...

SURF: Improving classifiers in production by learning from busy and noisy end users

Supervised learning classifiers inevitably make mistakes in production, ...

Active Preference Learning using Maximum Regret

We study active preference learning as a framework for intuitively speci...