Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

01/24/2023
by   Yash Chandak, et al.
6

Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-stationarity). In this work, we take the first steps towards the fundamental challenge of on-policy and off-policy evaluation amidst structured changes due to active, passive, or hybrid non-stationarity. Towards this goal, we make a higher-order stationarity assumption such that non-stationarity results in changes over time, but the way changes happen is fixed. We propose, OPEN, an algorithm that uses a double application of counterfactual reasoning and a novel importance-weighted instrument-variable regression to obtain both a lower bias and a lower variance estimate of the structure in the changes of a policy's past performances. Finally, we show promising results on how OPEN can be used to predict future performances for several domains inspired by real-world applications that exhibit non-stationarity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2020

Optimizing for the Future in Non-Stationary MDPs

Most reinforcement learning methods are based upon the key assumption th...
research
10/23/2020

Towards Safe Policy Improvement for Non-Stationary MDPs

Many real-world sequential decision-making problems involve critical sys...
research
02/25/2022

Decision Making in Non-Stationary Environments with Policy-Augmented Monte Carlo Tree Search

Decision-making under uncertainty (DMU) is present in many important pro...
research
04/26/2021

Universal Off-Policy Evaluation

When faced with sequential decision-making problems, it is often useful ...
research
06/05/2019

Lifelong Learning with a Changing Action Set

In many real-world sequential decision making problems, the number of av...
research
06/23/2018

A breakpoint detection in the mean model with heterogeneous variance on fixed time-intervals

This work is motivated by an application for the homogeneization of GNSS...
research
02/15/2023

VDHLA: Variable Depth Hybrid Learning Automaton and Its Application to Defense Against the Selfish Mining Attack in Bitcoin

Learning Automaton (LA) is an adaptive self-organized model that improve...

Please sign up or login with your details

Forgot password? Click here to reset