Reward Informed Dreamer for Task Generalization in Reinforcement Learning

03/09/2023
by   Chengyang Ying, et al.
0

A long-standing goal of reinforcement learning is that algorithms can learn on training tasks and generalize well on unseen tasks like humans, where different tasks share similar dynamic with different reward functions. A general challenge is that it is nontrivial to quantitatively measure the similarities between these different tasks, which is vital for analyzing the task distribution and further designing algorithms with stronger generalization. To address this, we present a novel metric named Task Distribution Relevance (TDR) via optimal Q functions to capture the relevance of the task distribution quantitatively. In the case of tasks with a high TDR, i.e., the tasks differ significantly, we demonstrate that the Markovian policies cannot distinguish them, yielding poor performance accordingly. Based on this observation, we propose a framework of Reward Informed Dreamer (RID) with reward-informed world models, which captures invariant latent features over tasks and encodes reward signals into policies for distinguishing different tasks. In RID, we calculate the corresponding variational lower bound of the log-likelihood on the data, which includes a novel term to distinguish different tasks via states, based on reward-informed world models. Finally, extensive experiments in DeepMind control suite demonstrate that RID can significantly improve the performance of handling different tasks at the same time, especially for those with high TDR, and further generalize to unseen tasks effectively.

READ FULL TEXT

page 6

page 7

research
04/23/2021

DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

Can we use reinforcement learning to learn general-purpose policies that...
research
10/08/2021

Training Transition Policies via Distribution Matching for Complex Tasks

Humans decompose novel complex tasks into simpler ones to exploit previo...
research
11/12/2020

Hierarchical reinforcement learning for efficient exploration and transfer

Sparse-reward domains are challenging for reinforcement learning algorit...
research
10/29/2021

Xi-Learning: Successor Feature Transfer Learning for General Reward Functions

Transfer in Reinforcement Learning aims to improve learning performance ...
research
01/13/2021

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Reinforcement learning methods trained on few environments rarely learn ...
research
05/06/2020

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization

We study the problem of learning exploration-exploitation strategies tha...
research
07/07/2021

Learning Time-Invariant Reward Functions through Model-Based Inverse Reinforcement Learning

Inverse reinforcement learning is a paradigm motivated by the goal of le...

Please sign up or login with your details

Forgot password? Click here to reset