Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models

by   Qi Wang, et al.

Reinforcement learning is a promising paradigm for solving sequential decision-making problems, but low data efficiency and weak generalization across tasks are bottlenecks in real-world applications. Model-based meta reinforcement learning addresses these issues by learning dynamics and leveraging knowledge from prior experience. In this paper, we take a closer look at this framework, and propose a new Thompson-sampling based approach that consists of a new model to identify task dynamics together with an amortized policy optimization step. We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics. Additionally, our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.


page 3

page 5

page 7

page 9

page 10

page 13

page 15

page 16


Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being...

Model Based Meta Learning of Critics for Policy Gradients

Being able to seamlessly generalize across different tasks is fundamenta...

Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription

There is now significant historical data available on decision making in...

Improving Hyperparameter Optimization by Planning Ahead

Hyperparameter optimization (HPO) is generally treated as a bi-level opt...

ED2: An Environment Dynamics Decomposition Framework for World Model Construction

Model-based reinforcement learning methods achieve significant sample ef...

Policy Optimization in Bayesian Network Hybrid Models of Biomanufacturing Processes

Biopharmaceutical manufacturing is a rapidly growing industry with impac...

Planning with Diffusion for Flexible Behavior Synthesis

Model-based reinforcement learning methods often use learning only for t...