Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization

09/04/2020
by   Recep Yusuf Bekci, et al.
0

Continuous control is a widely applicable area of reinforcement learning. The main players of this area are actor-critic methods that utilize policy gradients of neural approximators as a common practice. The focus of our study is to show the characteristics of the actor loss function which is the essential part of the optimization. We exploit low dimensional visualizations of the loss function and provide comparisons for loss landscapes of various algorithms. Furthermore, we apply our approach to multi-store dynamic inventory control, a notoriously difficult problem in supply chain operations, and explore the shape of the loss function associated with the optimal policy. We modelled and solved the problem using reinforcement learning while having a loss landscape in favor of optimality.

READ FULL TEXT
research
12/11/2019

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants...
research
03/08/2022

SO(2)-Equivariant Reinforcement Learning

Equivariant neural networks enforce symmetry within the structure of the...
research
08/19/2021

Global Convergence of the ODE Limit for Online Actor-Critic Algorithms in Reinforcement Learning

Actor-critic algorithms are widely used in reinforcement learning, but a...
research
09/06/2023

Addressing Imperfect Symmetry: a Novel Symmetry-Learning Actor-Critic Extension

Symmetry, a fundamental concept to understand our environment, often ove...
research
03/02/2020

Gaussian Process Policy Optimization

We propose a novel actor-critic, model-free reinforcement learning algor...
research
10/27/2021

Stabilising viscous extensional flows using Reinforcement Learning

The four-roll mill, wherein four identical cylinders undergo rotation of...
research
10/09/2019

Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments

This paper investigates how to efficiently transition and update policie...

Please sign up or login with your details

Forgot password? Click here to reset