Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

10/27/2020
by   Tim Seyde, et al.
2

Learning complex behaviors through interaction requires coordinated long-term planning. Random exploration and novelty search lack task-centric guidance and waste effort on non-informative interactions. Instead, decision making should target samples with the potential to optimize performance far into the future, while only reducing uncertainty where conducive to this objective. This paper presents latent optimistic value exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine finite horizon rollouts from a latent model with value function estimates to predict infinite horizon returns and recover associated uncertainty through ensembling. Policy training then proceeds on an upper confidence bound (UCB) objective to identify and select the interactions most promising to improve long-term performance. We apply LOVE to visual control tasks in continuous state-action spaces and demonstrate improved sample complexity on a selection of benchmarking tasks.

READ FULL TEXT

page 2

page 6

page 14

page 15

page 16

research
04/07/2022

Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping

We investigate the "Visual Pushing for Grasping" (VPG) system by Zeng et...
research
01/27/2019

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

A fundamental question in reinforcement learning is whether model-free a...
research
11/05/2018

Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control

We propose a plan online and learn offline (POLO) framework for the sett...
research
11/29/2017

Efficient exploration with Double Uncertain Value Networks

This paper studies directed exploration for reinforcement learning agent...
research
03/05/2019

Learning Dynamics Model in Reinforcement Learning by Incorporating the Long Term Future

In model-based reinforcement learning, the agent interleaves between mod...
research
05/20/2022

Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space

Intrinsic motivation generates behaviors that do not necessarily lead to...
research
02/14/2019

Learn a Prior for RHEA for Better Online Planning

Rolling Horizon Evolutionary Algorithms (RHEA) are a class of online pla...

Please sign up or login with your details

Forgot password? Click here to reset