Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction

12/05/2018
by   Jonathan J Hunt, et al.
2

Deep reinforcement learning (RL) algorithms have made great strides in recent years. An important remaining challenge is the ability to quickly transfer existing skills to novel tasks, and to combine existing skills with newly acquired ones. In domains where tasks are solved by composing skills this capacity holds the promise of dramatically reducing the data requirements of deep RL algorithms, and hence increasing their applicability. Recent work has studied ways of composing behaviors represented in the form of action-value functions. We analyze these methods to highlight their strengths and weaknesses, and point out situations where each of them is susceptible to poor performance. To perform this analysis we extend generalized policy improvement to the max-entropy framework and introduce a method for the practical implementation of successor features in continuous action spaces. Then we propose a novel approach which, in principle, recovers the optimal policy during transfer. This method works by explicitly learning the (discounted, future) divergence between policies. We study this approach in the tabular case and propose a scalable variant that is applicable in multi-dimensional continuous action spaces. We compare our approach with existing ones on a range of non-trivial continuous control problems with compositional structure, and demonstrate qualitatively better performance despite not requiring simultaneous observation of all task rewards.

READ FULL TEXT

page 2

page 3

page 4

page 7

page 11

page 16

page 20

page 23

research
09/26/2019

CAQL: Continuous Action Q-Learning

Value-based reinforcement learning (RL) methods like Q-learning have sho...
research
07/12/2018

Will it Blend? Composing Value Functions in Reinforcement Learning

An important property for lifelong-learning agents is the ability to com...
research
01/30/2019

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

The ability to transfer skills across tasks has the potential to scale u...
research
09/17/2019

Adversarial Feature Training for Generalizable Robotic Visuomotor Control

Deep reinforcement learning (RL) has enabled training action-selection p...
research
05/14/2017

Discrete Sequential Prediction of Continuous Actions for Deep RL

It has long been assumed that high dimensional continuous control proble...
research
02/13/2018

Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control

Deep reinforcement learning has demonstrated increasing capabilities for...
research
06/29/2023

Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning

Compositionality is a critical aspect of scalable system design. Reinfor...

Please sign up or login with your details

Forgot password? Click here to reset