Distributional constrained reinforcement learning for supply chain optimization

02/03/2023
by   Jaime Sabal Bermúdez, et al.
0

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on production and inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for reliable constraint satisfaction in RL. Our approach is based on Constrained Policy Optimization (CPO), which is subject to approximation errors that in practice lead it to converge to infeasible policies. We address this issue by incorporating aspects of distributional RL into DCPO. Specifically, we represent the return and cost value functions using neural networks that output discrete distributions, and we reshape costs based on the associated confidence. Using a supply chain case study, we show that DCPO improves the rate at which the RL policy converges and ensures reliable constraint satisfaction by the end of training. The proposed method also improves predictability, greatly reducing the variance of returns between runs, respectively; this result is significant in the context of policy gradient methods, which intrinsically introduce significant variance during training.

READ FULL TEXT
research
11/28/2022

Quantile Constrained Reinforcement Learning: A Reinforcement Learning Framework Constraining Outage Probability

Constrained reinforcement learning (RL) is an area of RL whose objective...
research
07/04/2023

A Scalable Reinforcement Learning-based System Using On-Chain Data for Cryptocurrency Portfolio Management

On-chain data (metrics) of blockchain networks, akin to company fundamen...
research
01/26/2023

Efficient Trust Region-Based Safe Reinforcement Learning with Low-Bias Distributional Actor-Critic

To apply reinforcement learning (RL) to real-world applications, agents ...
research
07/15/2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

We study the multi-step off-policy learning approach to distributional R...
research
08/29/2021

A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

Although well-established in general reinforcement learning (RL), value-...
research
06/04/2020

Constrained Reinforcement Learning for Dynamic Optimization under Uncertainty

Dynamic real-time optimization (DRTO) is a challenging task due to the f...
research
12/18/2019

Distributional Reinforcement Learning for Energy-Based Sequential Models

Global Autoregressive Models (GAMs) are a recent proposal [Parshakova et...

Please sign up or login with your details

Forgot password? Click here to reset