Contextual Policy Optimisation

05/27/2018
by   Supratik Paul, et al.
2

Policy gradient methods have been successfully applied to a variety of reinforcement learning tasks. However, while learning in a simulator, these methods do not utilise the opportunity to improve learning by adjusting certain environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but that are controllable in a simulator. This can lead to slow learning, or convergence to highly suboptimal policies. In this paper, we present contextual policy optimisation (CPO). The central idea is to use Bayesian optimisation to actively select the distribution of the environment variable that maximises the improvement generated by each iteration of the policy gradient method. To make this Bayesian optimisation practical, we contribute two easy-to-compute low-dimensional fingerprints of the current policy. We apply CPO to a number of continuous control tasks of varying difficulty and show that CPO can efficiently learn policies that are robust to significant rare events, which are unlikely to be observable under random sampling but are key to learning good policies.

READ FULL TEXT
research
05/24/2016

Alternating Optimisation and Quadrature for Robust Control

Bayesian optimisation has been successfully applied to a variety of rein...
research
05/27/2019

Policy Search by Target Distribution Learning for Continuous Control

We observe that several existing policy gradient methods (such as vanill...
research
02/17/2018

Learning to Race through Coordinate Descent Bayesian Optimisation

In the automation of many kinds of processes, the observable outcome can...
research
05/15/2022

Policy Gradient Method For Robust Reinforcement Learning

This paper develops the first policy gradient method with global optimal...
research
01/29/2023

Contextual Causal Bayesian Optimisation

Causal Bayesian optimisation (CaBO) combines causality with Bayesian opt...
research
11/03/2020

A Study of Policy Gradient on a Class of Exactly Solvable Models

Policy gradient methods are extensively used in reinforcement learning a...
research
05/01/2020

Smart Containers With Bidding Capacity: A Policy Gradient Algorithm for Semi-Cooperative Learning

Smart modular freight containers – as propagated in the Physical Interne...

Please sign up or login with your details

Forgot password? Click here to reset