
OSOM: A Simultaneously Optimal Algorithm for MultiArmed and Linear Contextual Bandits
We consider the stochastic linear (multiarmed) contextual bandit proble...
read it

Nonparametric Stochastic Contextual Bandits
We analyze the Karmed bandit problem where the reward for each arm is a...
read it

Regret bounds for NarendraShapiro bandit algorithms
NarendraShapiro (NS) algorithms are bandittype algorithms that have be...
read it

Fast rates in structured prediction
Discrete supervised learning problems such as classification are often t...
read it

Convergence rates of efficient global optimization algorithms
Efficient global optimization is the problem of minimizing an unknown fu...
read it

Joint AP Probing and Scheduling: A Contextual Bandit Approach
We consider a set of APs with unknown data rates that cooperatively serv...
read it

The Fast Convergence of Incremental PCA
We consider a situation in which we see samples in R^d drawn i.i.d. from...
read it
Regularized Contextual Bandits
We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy which is known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously  and independently  regularized multiarmed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problemindependent convergence rates, ending up in intermediate convergence rates interpolating between the aforementioned slow and fast rates.
READ FULL TEXT
Comments
There are no comments yet.