Transfer RL across Observation Feature Spaces via Model-Based Regularization

by   Yanchao Sun, et al.

In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations, and may thus be subject to dramatic changes over time (e.g. increased number of observable features). However, when the observation space changes, the previous policy will likely fail due to the mismatch of input features, and another policy must be trained from scratch, which is inefficient in terms of computation and sample complexity. Following theoretical insights, we propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task to use as a model-based regularizer. Our algorithm works for drastic changes of observation space (e.g. from vector-based observation to image-based observation), without any inter-task mapping or any prior knowledge of the target task. Empirical results show that our algorithm significantly improves the efficiency and stability of learning in the target task.



page 1

page 2

page 3

page 4


Offline Reinforcement Learning from Images with Latent Space Models

Offline reinforcement learning (RL) refers to the problem of learning po...

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

Deep reinforcement learning (RL) algorithms can use high-capacity deep n...

Observation Space Matters: Benchmark and Optimization Algorithm

Recent advances in deep reinforcement learning (deep RL) enable research...

Provably Sample-Efficient RL with Side Information about Latent Dynamics

We study reinforcement learning (RL) in settings where observations are ...

Provable Benefits of Representational Transfer in Reinforcement Learning

We study the problem of representational transfer in RL, where an agent ...

Towards Evaluating Adaptivity of Model-Based Reinforcement Learning Methods

In recent years, a growing number of deep model-based reinforcement lear...

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Bayesian policy reuse (BPR) is a general policy transfer framework for s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.