Learning optimal environments using projected stochastic gradient ascent

06/02/2020
by   Adrien Bolland, et al.
16

In this work, we generalize the direct policy search algorithms to an algorithm we call Direct Environment Search with (projected stochastic) Gradient Ascent (DESGA). The latter can be used to jointly learn a reinforcement learning (RL) environment and a policy with maximal expected return over a joint hypothesis space of environments and policies. We illustrate the performance of DESGA on two benchmarks. First, we consider a parametrized space of Mass-Spring-Damper (MSD) environments. Then, we use our algorithm for optimizing the size of the components and the operation of a small-scale and autonomous energy system, i.e. a solar off-grid microgrid, composed of photovoltaic panels, batteries, etc. The results highlight the excellent performances of the DESGA algorithm.

READ FULL TEXT
research
05/11/2023

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved w...
research
12/03/2022

Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

In constrained reinforcement learning (C-RL), an agent seeks to learn fr...
research
02/06/2022

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from...
research
09/03/2020

Optimality-based Analysis of XCSF Compaction in Discrete Reinforcement Learning

Learning classifier systems (LCSs) are population-based predictive syste...
research
08/16/2022

Making Reinforcement Learning Work on Swimmer

The SWIMMER environment is a standard benchmark in reinforcement learnin...
research
06/14/2019

Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces

Direct optimization is an appealing approach to differentiating through ...
research
05/28/2021

Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

The reinforcement learning problem of finding a control policy that mini...

Please sign up or login with your details

Forgot password? Click here to reset