CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies

05/19/2022
by   Mohamed Alami Chehboune, et al.
0

Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solving a given problem (task or environment) involves converging towards an optimal policy. However, there might exist multiple optimal policies that can dramatically differ in their behaviour; for example, some may be faster than the others but at the expense of greater risk. We consider and study a distribution of optimal policies. We design a curiosity-augmented Metropolis algorithm (CAMEO), such that we can sample optimal policies, and such that these policies effectively adopt diverse behaviours, since this implies greater coverage of the different possible optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems, and even in the challenging case of environments that provide sparse rewards. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability, and represents a first step towards learning the distribution of optimal policies itself.

READ FULL TEXT
research
02/23/2021

State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards

Constrained reinforcement learning involves multiple rewards that must i...
research
07/23/2023

Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach

We propose a machine learning approach to the optimal control of multicl...
research
06/14/2021

Learning Intrusion Prevention Policies through Optimal Stopping

We study automated intrusion prevention using reinforcement learning. In...
research
09/15/2022

Sparsity Inducing Representations for Policy Decompositions

Policy Decomposition (PoDec) is a framework that lessens the curse of di...
research
04/06/2020

Learning Stabilizing Control Policies for a Tensegrity Hopper with Augmented Random Search

In this paper, we consider tensegrity hopper - a novel tensegrity-based ...
research
07/15/2022

Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions

We present a method for finding optimal hedging policies for arbitrary i...
research
05/06/2022

Optimal Control as Variational Inference

In this article we address the stochastic and risk sensitive optimal con...

Please sign up or login with your details

Forgot password? Click here to reset