Lean Evolutionary Reinforcement Learning by Multitasking with Importance Sampling

03/21/2022
by   Nick Zhang, et al.
0

Studies have shown evolution strategies (ES) to be a promising approach for reinforcement learning (RL) with deep neural networks. However, the issue of high sample complexity persists in applications of ES to deep RL. In this paper, we address the shortcoming of today's methods via a novel neuroevolutionary multitasking (NuEMT) algorithm, designed to transfer information from a set of auxiliary tasks (of short episode length) to the target (full length) RL task at hand. The artificially generated auxiliary tasks allow an agent to update and quickly evaluate policies on shorter time horizons. The evolved skills are then transferred to guide the longer and harder task towards an optimal policy. We demonstrate that the NuEMT algorithm achieves data-lean evolutionary RL, reducing expensive agent-environment interaction data requirements. Our key algorithmic contribution in this setting is to introduce, for the first time, a multitask information transfer mechanism based on the statistical importance sampling technique. In addition, an adaptive resource allocation strategy is utilized to assign computational resources to auxiliary tasks based on their gleaned usefulness. Experiments on a range of continuous control tasks from the OpenAI Gym confirm that our proposed algorithm is efficient compared to recent ES baselines.

READ FULL TEXT

page 1

page 7

research
05/28/2018

Importance Weighted Transfer of Samples in Reinforcement Learning

We consider the transfer of experience samples (i.e., tuples < s, a, s',...
research
11/12/2018

Importance Weighted Evolution Strategies

Evolution Strategies (ES) emerged as a scalable alternative to popular R...
research
10/18/2021

Model-Based Reinforcement Learning Framework of Online Network Resource Allocation

Online Network Resource Allocation (ONRA) for service provisioning is a ...
research
06/03/2022

Beyond Tabula Rasa: Reincarnating Reinforcement Learning

Learning tabula rasa, that is without any prior knowledge, is the preval...
research
06/21/2020

Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation

Recently, several approaches have been proposed to solve language genera...
research
10/26/2022

ERL-Re^2: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

Deep Reinforcement Learning (Deep RL) and Evolutionary Algorithm (EA) ar...
research
06/04/2018

Asymptotic optimality of adaptive importance sampling

Adaptive importance sampling (AIS) uses past samples to update the sampl...

Please sign up or login with your details

Forgot password? Click here to reset