Non-Stationary Representation Learning in Sequential Linear Bandits

01/13/2022
by   Yuzhen Qin, et al.
0

In this paper, we study representation learning for multi-task decision-making in non-stationary environments. We consider the framework of sequential linear bandits, where the agent performs a series of tasks drawn from distinct sets associated with different environments. The embeddings of tasks in each set share a low-dimensional feature extractor called representation, and representations are different across sets. We propose an online algorithm that facilitates efficient decision-making by learning and transferring non-stationary representations in an adaptive fashion. We prove that our algorithm significantly outperforms the existing ones that treat tasks independently. We also conduct experiments using both synthetic and real data to validate our theoretical insights and demonstrate the efficacy of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/07/2023

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Boun...
research
05/12/2022

Representation Learning for Context-Dependent Decision-Making

Humans are capable of adjusting to changing environments flexibly and qu...
research
09/13/2021

Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

We propose SLTD (`Sequential Learning-to-Defer') a framework for learnin...
research
02/09/2023

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Despite the recent success of representation learning in sequential deci...
research
12/16/2020

Lévy walks derived from a Bayesian decision-making model in non-stationary environments

Lévy walks are found in the migratory behaviour patterns of various orga...
research
01/21/2023

Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

We study the problem of learning goal-conditioned policies in Minecraft,...
research
02/05/2018

Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi Coexistence

Bandit Convex Optimisation (BCO) is a powerful framework for sequential ...

Please sign up or login with your details

Forgot password? Click here to reset