Ready Policy One: World Building Through Active Learning

02/07/2020
by   Philip Ball, et al.
10

Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks. However, many existing MBRL methods rely on combining greedy policies with exploration heuristics, and even those which utilize principled exploration bonuses construct dual objectives in an ad hoc fashion. In this paper we introduce Ready Policy One (RP1), a framework that views MBRL as an active learning problem, where we aim to improve the world model in the fewest samples possible. RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization, allowing the algorithm to trade off reward v.s. exploration at different stages of learning. In addition, we introduce a principled mechanism to terminate sample collection once we have a rich enough trajectory batch to improve the model. We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.

READ FULL TEXT
research
06/19/2019

Batch Active Learning Using Determinantal Point Processes

Data collection and labeling is one of the main challenges in employing ...
research
04/16/2023

Dynamic Exploration-Exploitation Trade-Off in Active Learning Regression with Bayesian Hierarchical Modeling

Active learning provides a framework to adaptively sample the most infor...
research
03/07/2019

Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces

We present a new algorithm ASEBO for conducting optimization of high-dim...
research
06/26/2023

BatchGFN: Generative Flow Networks for Batch Active Learning

We introduce BatchGFN – a novel approach for pool-based active learning ...
research
01/23/2021

Rethinking Exploration for Sample-Efficient Policy Learning

Off-policy reinforcement learning for control has made great strides in ...
research
01/01/2019

An Active Learning Framework for Efficient Robust Policy Search

Robust Policy Search is the problem of learning policies that do not deg...
research
06/05/2020

An Ergodic Measure for Active Learning From Equilibrium

This paper develops KL-Ergodic Exploration from Equilibrium (KL-E^3), a ...

Please sign up or login with your details

Forgot password? Click here to reset