Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control

11/05/2018
by   Kendall Lowrey, et al.
4

We propose a plan online and learn offline (POLO) framework for the setting where an agent, with an internal model, needs to continually act and learn in the world. Our work builds on the synergistic relationship between local model-based control, global value function learning, and exploration. We study how local trajectory optimization can cope with approximation errors in the value function, and can stabilize and accelerate value function learning. Conversely, we also study how approximate value functions can help reduce the planning horizon and allow for better policies beyond local solutions. Finally, we also demonstrate how trajectory optimization can be used to perform temporally coordinated exploration in conjunction with estimating uncertainty in value function approximation. This exploration is critical for fast and stable learning of the value function. Combining these components enable solutions to complex control tasks, like humanoid locomotion and dexterous in-hand manipulation, in the equivalent of a few minutes of experience in the real world.

READ FULL TEXT

page 2

page 6

research
08/23/2020

Learning Off-Policy with Online Planning

We propose Learning Off-Policy with Online Planning (LOOP), combining th...
research
06/07/2023

Online Multi-Contact Receding Horizon Planning via Value Function Approximation

Planning multi-contact motions in a receding horizon fashion requires a ...
research
06/06/2019

A novel approach to model exploration for value function learning

Planning and Learning are complementary approaches. Planning relies on d...
research
12/19/2019

Uncertainty-sensitive Learning and Planning with Ensembles

We propose a reinforcement learning framework for discrete environments ...
research
10/27/2020

Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

Learning complex behaviors through interaction requires coordinated long...
research
06/05/2023

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs

Exploration in environments which differ across episodes has received in...
research
12/10/2018

Improving Model-Based Control and Active Exploration with Reconstruction Uncertainty Optimization

Model based predictions of future trajectories of a dynamical system oft...

Please sign up or login with your details

Forgot password? Click here to reset