Pre-training as Batch Meta Reinforcement Learning with tiMe

09/25/2019
by   Quan Vuong, et al.
0

Pre-training is transformative in supervised learning: a large network trained with large and existing datasets can be used as an initialization when learning a new task. Such initialization speeds up convergence and leads to higher performance. In this paper, we seek to understand what the formalization for pre-training from only existing and observational data in Reinforcement Learning (RL) is and whether it is possible. We formulate the setting as Batch Meta Reinforcement Learning. We identify MDP mis-identification to be a central challenge and motivate it with theoretical analysis. Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data. In challenging control tasks and without fine-tuning on unseen MDPs, tiMe is competitive with state-of-the-art model-free RL method trained with hundreds of thousands of environment interactions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

Learning to Modulate pre-trained Models in RL

Reinforcement Learning (RL) has been successful in various domains like ...
research
10/20/2022

Hypernetworks in Meta-Reinforcement Learning

Training a reinforcement learning (RL) agent on a real-world robotics ta...
research
03/09/2023

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

A compelling use case of offline reinforcement learning (RL) is to obtai...
research
12/09/2019

Unsupervised Curricula for Visual Meta-Reinforcement Learning

In principle, meta-reinforcement learning algorithms leverage experience...
research
06/14/2022

Distributed and Distribution-Robust Meta Reinforcement Learning (D2-RMRL) for Data Pre-storing and Routing in Cube Satellite Networks

In this paper, the problem of data pre-storing and routing in dynamic, r...
research
07/10/2020

Vizarel: A System to Help Better Understand RL Agents

Visualization tools for supervised learning have allowed users to interp...
research
06/28/2022

Zero-Shot Building Control

Heating and cooling systems in buildings account for 31 use, much of whi...

Please sign up or login with your details

Forgot password? Click here to reset