A Reinforcement Learning Approach to the Stochastic Cutting Stock Problem

We propose a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process. At each decision epoch, given current inventory of items, an agent chooses in which patterns to cut objects in stock in anticipation of the unknown demand. An optimal solution corresponds to a policy that associates each state with a decision and minimizes the expected total cost. Since exact algorithms scale exponentially with the state-space dimension, we develop a heuristic solution approach based on reinforcement learning. We propose an approximate policy iteration algorithm in which we apply a linear model to approximate the action-value function of a policy. Policy evaluation is performed by solving the projected Bellman equation from a sample of state transitions, decisions and costs obtained by simulation. Due to the large decision space, policy improvement is performed via the cross-entropy method. Computational experiments are carried out with the use of realistic data to illustrate the application of the algorithm. Heuristic policies obtained with polynomial and Fourier basis functions are compared with myopic and random policies. Results indicate the possibility of obtaining policies capable of adequately controlling inventories with an average cost up to 80

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2013

PEGASUS: A Policy Search Method for Large MDPs and POMDPs

We propose a new approach to the problem of searching a space of policie...
research
02/11/2020

Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

Off-policy evaluation of sequential decision policies from observational...
research
10/15/2020

Optimal Dispatch in Emergency Service System via Reinforcement Learning

In the United States, medical responses by fire departments over the las...
research
01/31/2012

Learning RoboCup-Keepaway with Kernels

We apply kernel-based methods to solve the difficult reinforcement learn...
research
07/13/2023

Deep reinforcement learning for the dynamic vehicle dispatching problem: An event-based approach

The dynamic vehicle dispatching problem corresponds to deciding which ve...
research
09/09/2011

Integrating Learning from Examples into the Search for Diagnostic Policies

This paper studies the problem of learning diagnostic policies from trai...
research
03/19/2023

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Value iteration can find the optimal replenishment policy for a perishab...

Please sign up or login with your details

Forgot password? Click here to reset