
Model Selection for Generic Contextual Bandits
We consider the problem of model selection for the general stochastic co...
read it

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability
We consider the general (stochastic) contextual bandit problem under the...
read it

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits
We consider an adversarial variant of the classic Karmed linear context...
read it

Balanced Linear Contextual Bandits
Contextual bandit algorithms are sensitive to the estimation method of t...
read it

Mitigating Bias in Adaptive Data Gathering via Differential Privacy
Data that is gathered adaptively  via bandit algorithms, for example ...
read it

Contextual Bandits under Delayed Feedback
Delayed feedback is an ubiquitous problem in many industrial systems emp...
read it

A banditlearning approach to multifidelity approximation
Multifidelity approximation is an important technique in scientific comp...
read it
Tractable contextual bandits beyond realizability
Tractable contextual bandit algorithms often rely on the realizability assumption – i.e., that the true expected reward model belongs to a known class, such as linear functions. We investigate issues that arise in the absence of realizability and note that the dynamics of adaptive data collection can lead commonly used bandit algorithms to learn a suboptimal policy. In this work, we present a tractable bandit algorithm that is not sensitive to the realizability assumption and computationally reduces to solving a constrained regression problem in every epoch. When realizability does not hold, our algorithm ensures the same guarantees on regret achieved by realizabilitybased algorithms under realizability, up to an additive term that accounts for the misspecification error. This extra term is proportional to T times the (2/5)root of the mean squared error between the best model in the class and the true model, where T is the total number of timesteps. Our work sheds light on the biasvariance tradeoff for tractable contextual bandits. This tradeoff is not captured by algorithms that assume realizability, since under this assumption there exists an estimator in the class that attains zero bias.
READ FULL TEXT
Comments
There are no comments yet.