DeepAI
Log In Sign Up

Hindsight Learning for MDPs with Exogenous Inputs

07/13/2022
by   Sean R. Sinclair, et al.
0

We develop a reinforcement learning (RL) framework for applications that deal with sequential decisions and exogenous uncertainty, such as resource allocation and inventory management. In these applications, the uncertainty is only due to exogenous variables like future demands. A popular approach is to predict the exogenous variables using historical data and then plan with the predictions. However, this indirect approach requires high-fidelity modeling of the exogenous process to guarantee good downstream decision-making, which can be impractical when the exogenous process is complex. In this work we propose an alternative approach based on hindsight learning which sidesteps modeling the exogenous process. Our key insight is that, unlike Sim2Real RL, we can revisit past decisions in the historical data and derive counterfactual consequences for other actions in these applications. Our framework uses hindsight-optimal actions as the policy training signal and has strong theoretical guarantees on decision-making performance. We develop an algorithm using our framework to allocate compute resources for real-world Microsoft Azure workloads. The results show our approach learns better policies than domain-specific heuristics and Sim2Real RL baselines.

READ FULL TEXT
06/21/2020

On Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throug...
10/18/2021

Model-Based Reinforcement Learning Framework of Online Network Resource Allocation

Online Network Resource Allocation (ONRA) for service provisioning is a ...
11/02/2022

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Reinforcement Learning (RL)-based control system has received considerab...
03/15/2021

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Reinforcement learning (RL) algorithms aim to learn optimal decisions in...
11/08/2022

Reinforcement Learning with Stepwise Fairness Constraints

AI methods are used in societally important settings, ranging from credi...
09/14/2015

Optimization of anemia treatment in hemodialysis patients via reinforcement learning

Objective: Anemia is a frequent comorbidity in hemodialysis patients tha...
10/01/2019

Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

This paper describes a purely data-driven solution to a class of sequent...