Semiparametric Contextual Bandits

03/12/2018
by   Akshay Krishnamurthy, et al.
0

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for an action is modeled as a linear function of known action features confounded by an non-linear action-independent term. We design new algorithms that achieve Õ(d√(T)) regret over T rounds, when the linear function is d-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenewald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2022

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

In this paper, we address the stochastic contextual linear bandit proble...
research
02/12/2023

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

We consider the sequential decision-making problem where the mean outcom...
research
10/12/2022

Maximum entropy exploration in contextual bandits with neural networks and energy based models

Contextual bandits can solve a huge range of real-world problems. Howeve...
research
01/05/2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

In this paper we study the adversarial combinatorial bandit with a known...
research
03/08/2022

Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

Reward-biased maximum likelihood estimation (RBMLE) is a classic princip...
research
10/18/2022

Contextual bandits with concave rewards, and an application to fair ranking

We consider Contextual Bandits with Concave Rewards (CBCR), a multi-obje...
research
03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...

Please sign up or login with your details

Forgot password? Click here to reset