Model-based controlled learning of MDP policies with an application to lost-sales inventory control

11/30/2020
by   Willem van Jaarsveld, et al.
0

Recent literature established that neural networks can represent good MDP policies across a range of stochastic dynamic models in supply chain and logistics. To overcome limitations of the model-free algorithms typically employed to learn/find such neural network policies, a model-based algorithm is proposed that incorporates variance reduction techniques. For the classical lost sales inventory model, the algorithm learns neural network policies that are superior to those learned using model-free algorithms, while also outperforming heuristic benchmarks. The algorithm may be an interesting candidate to apply to other stochastic dynamic problems in supply chain and logistics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2015

Learning Continuous Control Policies by Stochastic Value Gradients

We present a unified framework for learning continuous control policies ...
research
02/15/2021

Neuro-algorithmic Policies enable Fast Combinatorial Generalization

Although model-based and model-free approaches to learning the control o...
research
10/14/2019

Bootstrapping the Expressivity with Model-based Planning

We compare the model-free reinforcement learning with the model-based ap...
research
02/25/2022

Behaviorally Grounded Model-Based and Model Free Cost Reduction in a Simulated Multi-Echelon Supply Chain

Amplification and phase shift in ordering signals, commonly referred to ...
research
05/27/2023

Online Nonstochastic Model-Free Reinforcement Learning

In this work, we explore robust model-free reinforcement learning algori...
research
01/15/2021

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Using a high Update-To-Data (UTD) ratio, model-based methods have recent...

Please sign up or login with your details

Forgot password? Click here to reset