Piecewise-Stationary Off-Policy Optimization

06/15/2020
by   Joey Hong, et al.
0

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to state-of-the-art baselines on both synthetic and real-world datasets. Our approach outperforms methods that act only on observed context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2021

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

We address policy learning with logged data in contextual bandits. Curre...
research
06/06/2022

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deplo...
research
12/21/2020

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently require...
research
05/05/2021

Policy Learning with Adaptively Collected Data

Learning optimal policies from historical data enables the gains from pe...
research
02/14/2018

Online Learning for Non-Stationary A/B Tests

The rollout of new versions of a feature in modern applications is a man...
research
11/11/2021

Offline Contextual Bandits for Wireless Network Optimization

The explosion in mobile data traffic together with the ever-increasing e...
research
10/24/2022

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for offline policy optim...

Please sign up or login with your details

Forgot password? Click here to reset