Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

10/01/2019
by   Hardik Meisheri, et al.
26

This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research. We assume that while the micro-level behaviour of the system can be broadly captured by analytical expressions or simulation, the macro-level or emergent behaviour is complicated by non-linearity, constraints, and stochasticity. If we represent the set of concurrent decisions to be computed as a vector, each element of the vector is assumed to be a continuous variable, and the number of such elements is arbitrarily large and variable from one problem instance to another. We first formulate the decision-making problem as a canonical reinforcement learning (RL) problem, which can be solved using purely data-driven techniques. We modify a standard approach known as advantage actor critic (A2C) to ensure its suitability to the problem at hand, and compare its performance to that of baseline approaches on the specific instance of a multi-product inventory management task. The key modifications include a parallelised formulation of the decision-making task, and a training procedure that explicitly recognises the quantitative relationship between different decisions. We also present experimental results probing the learned policies, and their robustness to variations in the data.

READ FULL TEXT

page 17

page 18

research
06/07/2020

Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains

This paper describes the application of reinforcement learning (RL) to m...
research
04/14/2019

A Short Survey On Memory Based Reinforcement Learning

Reinforcement learning (RL) is a branch of machine learning which is emp...
research
06/12/2020

Recurrent Sum-Product-Max Networks for Decision Making in Perfectly-Observed Environments

Recent investigations into sum-product-max networks (SPMN) that generali...
research
04/11/2023

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

There is a growing interest in using reinforcement learning (RL) to pers...
research
03/15/2023

Bridging adaptive management and reinforcement learning for more robust decisions

From out-competing grandmasters in chess to informing high-stakes health...
research
11/21/2022

Data-Driven Offline Decision-Making via Invariant Representation Learning

The goal in offline data-driven decision-making is synthesize decisions ...
research
09/16/2019

Deep Reinforcement Learning for Task-driven Discovery of Incomplete Networks

Complex networks are often either too large for full exploration, partia...

Please sign up or login with your details

Forgot password? Click here to reset