Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

03/19/2023
by   Joseph Farrington, et al.
0

Value iteration can find the optimal replenishment policy for a perishable inventory problem, but is computationally demanding due to the large state spaces that are required to represent the age profile of stock. The parallel processing capabilities of modern GPUs can reduce the wall time required to run value iteration by updating many states simultaneously. The adoption of GPU-accelerated approaches has been limited in operational research relative to other fields like machine learning, in which new software frameworks have made GPU programming widely accessible. We used the Python library JAX to implement value iteration and simulators of the underlying Markov decision processes in a high-level API, and relied on this library's function transformations and compiler to efficiently utilize GPU hardware. Our method can extend use of value iteration to settings that were previously considered infeasible or impractical. We demonstrate this on example scenarios from three recent studies which include problems with over 16 million states and additional problem features, such as substitution between products, that increase computational complexity. We compare the performance of the optimal replenishment policies to heuristic policies, fitted using simulation optimization in JAX which allowed the parallel evaluation of multiple candidate policy parameters on thousands of simulated years. The heuristic policies gave a maximum optimality gap of 2.49 Our general approach may be applicable to a wide range of problems in operational research that would benefit from large-scale parallel computation on consumer-grade GPU hardware.

READ FULL TEXT
research
01/15/2014

Policy Iteration for Decentralized Control of Markov Decision Processes

Coordination of distributed agents is required for problems arising in m...
research
09/17/2013

Models and algorithms for skip-free Markov decision processes on trees

We introduce a class of models for multidimensional control problems whi...
research
01/23/2013

On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often fo...
research
10/27/2022

Confident Approximate Policy Iteration for Efficient Local Planning in q^π-realizable MDPs

We consider approximate dynamic programming in γ-discounted Markov decis...
research
09/20/2021

A Reinforcement Learning Approach to the Stochastic Cutting Stock Problem

We propose a formulation of the stochastic cutting stock problem as a di...
research
02/02/2022

Accelerated Quality-Diversity for Robotics through Massive Parallelism

Quality-Diversity (QD) algorithms are a well-known approach to generate ...
research
07/11/2012

Heuristic Search Value Iteration for POMDPs

We present a novel POMDP planning algorithm called heuristic search valu...

Please sign up or login with your details

Forgot password? Click here to reset