Invariant Policy Learning: A Causal Perspective

06/01/2021
by   Sorawit Saengkyongam, et al.
0

In the past decade, contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over time or over different environments. In many real world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we tackle the problem of environmental shifts under the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved confounders are present and show that, in that case, an optimal invariant policy is guaranteed, under certain assumptions, to generalize across environments. Our results do not only provide a solution to the environmental shift problem but also establish concrete connections among causality, invariance and contextual bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2023

Effect-Invariant Mechanisms for Policy Generalization

Policy learning is an important component of many real-world learning sy...
research
06/05/2021

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

The stochastic contextual bandit problem, which models the trade-off bet...
research
05/23/2018

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handl...
research
09/18/2022

Towards Robust Off-Policy Evaluation via Human Inputs

Off-policy Evaluation (OPE) methods are crucial tools for evaluating pol...
research
02/02/2022

Invariant Ancestry Search

Recently, methods have been proposed that exploit the invariance of pred...
research
08/03/2022

Equivariant Disentangled Transformation for Domain Generalization under Combination Shift

Machine learning systems may encounter unexpected problems when the data...
research
07/16/2020

Self-Tuning Bandits over Unknown Covariate-Shifts

Bandits with covariates, a.k.a. contextual bandits, address situations w...

Please sign up or login with your details

Forgot password? Click here to reset