Towards an Understanding of Default Policies in Multitask Policy Optimization

11/04/2021
by   Ted Moskovitz, et al.
0

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms, with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2017

Benchmark Environments for Multitask Learning in Continuous Domains

As demand drives systems to generalize to various domains and problems, ...
research
05/03/2019

Information asymmetry in KL-regularized RL

Many real world tasks exhibit rich structure that is repeated across dif...
research
06/13/2019

Jacobian Policy Optimizations

Recently, natural policy gradient algorithms gained widespread recogniti...
research
01/18/2019

On-Policy Trust Region Policy Optimisation with Replay Buffers

Building upon the recent success of deep reinforcement learning methods,...
research
03/18/2019

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL

As reinforcement learning agents are tasked with solving more challengin...
research
01/31/2022

Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO

We present a new monotonic improvement guarantee for optimizing decentra...
research
07/17/2022

Minimum Description Length Control

We propose a novel framework for multitask reinforcement learning based ...

Please sign up or login with your details

Forgot password? Click here to reset