Differentiable Meta-Learning in Contextual Bandits

06/09/2020
by   Branislav Kveton, et al.
0

We study a contextual bandit setting where the learning agent has access to sampled bandit instances from an unknown prior distribution P. The goal of the agent is to achieve high reward on average over the instances drawn from P. This setting is of a particular importance because it formalizes the offline optimization of bandit policies, to perform well on average over anticipated bandit instances. The main idea in our work is to optimize differentiable bandit policies by policy gradients. We derive reward gradients that reflect the structure of our problem, and propose contextual policies that are parameterized in a differentiable way and have low regret. Our algorithmic and theoretical contributions are supported by extensive experiments that show the importance of baseline subtraction, learned biases, and the practicality of our approach on a range of classification tasks.

READ FULL TEXT
research
02/17/2020

Differentiable Bandit Exploration

We learn bandit policies that maximize the average reward over bandit in...
research
06/15/2020

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...
research
12/28/2021

Learning Across Bandits in High Dimension via Robust Statistics

Decision-makers often face the "many bandits" problem, where one must si...
research
02/12/2018

Policy Gradients for Contextual Bandits

We study a generalized contextual-bandits problem, where there is a stat...
research
03/03/2022

Learning Selection Bias and Group Importance: Differentiable Reparameterization for the Hypergeometric Distribution

Partitioning a set of elements into a given number of groups of a priori...
research
11/15/2020

DORB: Dynamically Optimizing Multiple Rewards with Bandits

Policy gradients-based reinforcement learning has proven to be a promisi...
research
12/09/2022

Multi-Task Off-Policy Learning from Bandit Feedback

Many practical applications, such as recommender systems and learning to...

Please sign up or login with your details

Forgot password? Click here to reset