Reinforcement Learning without Ground-Truth State

05/20/2019
by   Xingyu Lin, et al.
0

To perform robot manipulation tasks, a low dimension state of the environment typically needs to be estimated. However, designing a state estimator can sometimes be difficult, especially in environments with deformable objects. An alternative is to learn an end-to-end policy that maps directly from high dimensional sensor inputs to actions. However, if this policy is trained with reinforcement learning, then without a state estimator, it is hard to specify a reward function based on continuous and high dimensional observations. To meet this challenge, we propose a simple indicator reward function for goal-conditioned reinforcement learning: we only give a positive reward when the robot's observation exactly matches a target goal observation. We show that by utilizing the goal relabeling technique, we can learn with the indicator reward function even in continuous state spaces, in which we do not expect two observations to ever be identical. We propose two methods to further speed up convergence with indicator rewards: reward balancing and reward filtering. We show comparable performance between our method and an oracle which uses the ground-truth state for computing rewards, even though our method only operates on raw observations and does not have access to the ground-truth state. We demonstrate our method in complex tasks in continuous state spaces such as rope manipulation from RGB-D images, without knowledge of the ground truth state.

READ FULL TEXT

page 1

page 8

research
11/17/2020

Learning Dense Rewards for Contact-Rich Manipulation Tasks

Rewards play a crucial role in reinforcement learning. To arrive at the ...
research
04/16/2019

End-to-End Robotic Reinforcement Learning without Reward Engineering

The combination of deep neural network models and reinforcement learning...
research
05/29/2018

Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition

The design of a reward function often poses a major practical challenge ...
research
05/04/2019

Hierarchical Policy Learning is Sensitive to Goal Space Design

Hierarchy in reinforcement learning agents allows for control at multipl...
research
01/25/2020

Following Instructions by Imagining and Reaching Visual Goals

While traditional methods for instruction-following typically assume pri...
research
04/13/2021

Reward Shaping with Dynamic Trajectory Aggregation

Reinforcement learning, which acquires a policy maximizing long-term rew...
research
09/30/2018

Few-Shot Goal Inference for Visuomotor Learning and Planning

Reinforcement learning and planning methods require an objective or rewa...

Please sign up or login with your details

Forgot password? Click here to reset