Reinforcement Learning Experience Reuse with Policy Residual Representation

05/31/2019
by   Wen-Ji Zhou, et al.
0

Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches.

READ FULL TEXT
research
06/11/2018

Context-Aware Policy Reuse

Transfer learning can greatly speed up reinforcement learning for a new ...
research
06/12/2020

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

Reinforcement learning algorithms can acquire policies for complex tasks...
research
07/15/2015

Massively Parallel Methods for Deep Reinforcement Learning

We present the first massively distributed architecture for deep reinfor...
research
06/20/2018

Skilled Experience Catalogue: A Skill-Balancing Mechanism for Non-Player Characters using Reinforcement Learning

In this paper, we introduce a skill-balancing mechanism for adversarial ...
research
02/26/2020

Generalized Hindsight for Reinforcement Learning

One of the key reasons for the high sample complexity in reinforcement l...
research
10/11/2022

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement

We present Variable Experience Rollout (VER), a technique for efficientl...
research
01/18/2019

WALL-E: An Efficient Reinforcement Learning Research Framework

There are two halves to RL systems: experience collection time and polic...

Please sign up or login with your details

Forgot password? Click here to reset