DeepAI
Log In Sign Up

Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

12/09/2020
by   Chenyang Zhao, et al.
0

In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback-Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing.

READ FULL TEXT
02/06/2020

Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach

Peer-to-peer knowledge transfer in distributed environments has emerged ...
05/20/2018

Learning to Teach in Cooperative Multiagent Reinforcement Learning

We present a framework and algorithm for peer-to-peer teaching in cooper...
06/07/2020

Peer Collaborative Learning for Online Knowledge Distillation

Traditional knowledge distillation uses a two-stage training strategy to...
07/13/2017

Distral: Robust Multitask Reinforcement Learning

Most deep reinforcement learning algorithms are data inefficient in comp...
02/01/2020

Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning

Off-policy ensemble reinforcement learning (RL) methods have demonstrate...
10/23/2019

Robust Domain Randomization for Reinforcement Learning

Producing agents that can generalize to a wide range of environments is ...
10/01/2020

Student-Initiated Action Advising via Advice Novelty

Action advising is a knowledge exchange mechanism between peers, namely ...