Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

02/16/2021
by   Qi Wang, et al.
21

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework, and propose a new Thompson-sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Additionally, our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.

READ FULL TEXT

page 3

page 5

page 7

page 9

page 10

page 13

page 15

page 16

09/14/2018

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being...
04/05/2022

Model Based Meta Learning of Critics for Policy Gradients

Being able to seamlessly generalize across different tasks is fundamenta...
02/13/2020

Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription

There is now significant historical data available on decision making in...
10/15/2021

Improving Hyperparameter Optimization by Planning Ahead

Hyperparameter optimization (HPO) is generally treated as a bi-level opt...
12/06/2021

ED2: An Environment Dynamics Decomposition Framework for World Model Construction

Model-based reinforcement learning methods achieve significant sample ef...
05/13/2021

Policy Optimization in Bayesian Network Hybrid Models of Biomanufacturing Processes

Biopharmaceutical manufacturing is a rapidly growing industry with impac...
05/20/2022

Planning with Diffusion for Flexible Behavior Synthesis

Model-based reinforcement learning methods often use learning only for t...