Log In Sign Up

Hindsight Learning for MDPs with Exogenous Inputs

by   Sean R. Sinclair, et al.

We develop a reinforcement learning (RL) framework for applications that deal with sequential decisions and exogenous uncertainty, such as resource allocation and inventory management. In these applications, the uncertainty is only due to exogenous variables like future demands. A popular approach is to predict the exogenous variables using historical data and then plan with the predictions. However, this indirect approach requires high-fidelity modeling of the exogenous process to guarantee good downstream decision-making, which can be impractical when the exogenous process is complex. In this work we propose an alternative approach based on hindsight learning which sidesteps modeling the exogenous process. Our key insight is that, unlike Sim2Real RL, we can revisit past decisions in the historical data and derive counterfactual consequences for other actions in these applications. Our framework uses hindsight-optimal actions as the policy training signal and has strong theoretical guarantees on decision-making performance. We develop an algorithm using our framework to allocate compute resources for real-world Microsoft Azure workloads. The results show our approach learns better policies than domain-specific heuristics and Sim2Real RL baselines.


On Optimism in Model-Based Reinforcement Learning

The principle of optimism in the face of uncertainty is prevalent throug...

Model-Based Reinforcement Learning Framework of Online Network Resource Allocation

Online Network Resource Allocation (ONRA) for service provisioning is a ...

Knowing the Past to Predict the Future: Reinforcement Virtual Learning

Reinforcement Learning (RL)-based control system has received considerab...

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Reinforcement learning (RL) algorithms aim to learn optimal decisions in...

Reinforcement Learning with Stepwise Fairness Constraints

AI methods are used in societally important settings, ranging from credi...

Optimization of anemia treatment in hemodialysis patients via reinforcement learning

Objective: Anemia is a frequent comorbidity in hemodialysis patients tha...

Reinforcement Learning for Multi-Objective Optimization of Online Decisions in High-Dimensional Systems

This paper describes a purely data-driven solution to a class of sequent...