
MetaThompson Sampling
Efficient exploration in multiarmed bandits is a fundamental online lea...
read it

NonStationary Latent Bandits
Users of recommender systems often behave in a nonstationary fashion, d...
read it

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
We propose a novel framework for structured bandits, which we call an in...
read it

Latent Bandits Revisited
A latent bandit problem is one in which the learning agent knows the arm...
read it

PiecewiseStationary OffPolicy Optimization
Offpolicy learning is a framework for evaluating and optimizing policie...
read it

Differentiable MetaLearning in Contextual Bandits
We study a contextual bandit setting where the learning agent has access...
read it

Sample Efficient GraphBased Optimization with Noisy Observations
We study sample complexity of optimizing "hillclimbing friendly" functi...
read it

Differentiable Bandit Exploration
We learn bandit policies that maximize the average reward over bandit in...
read it

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
We propose RandUCB, a bandit strategy that uses theoretically derived co...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

Waterfall Bandits: Learning to Sell Ads Online
A popular approach to selling online advertising is by a waterfall, wher...
read it

Empirical Bayes Regret Minimization
The prevalent approach to bandit algorithm design is to have a lowregre...
read it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
read it

Online Diverse Learning to Rank from PartialClick Feedback
Learning to rank is an important problem in machine learning and recomme...
read it

BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to rerank, where users provide ...
read it

TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decisionmaking problem where in...
read it

Conservative Exploration using Interleaving
In many practical problems, a learning agent may want to learn the best ...
read it

New Insights into Bootstrapping for Bandits
We investigate the use of bootstrapping in the bandit setting. We first ...
read it

Offline Evaluation of Ranking Policies with Click Models
Many web systems rank and present a list of items to users, from recomme...
read it

Nearly Optimal Adaptive Procedure for PiecewiseStationary Bandit: a ChangePoint Detection Approach
Multiarmed bandit (MAB) is a class of online learning problems where a ...
read it

Stochastic LowRank Bandits
Many problems in computer vision and recommender systems involve lowran...
read it

SpectralFPL: Online Spectral Learning for Single Topic Models
This paper studies how to efficiently learn an optimal latent variable m...
read it

Bernoulli Rank1 Bandits for Click Feedback
The probability that a user will click a search result depends both on i...
read it

Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
read it

Stochastic Rank1 Bandits
We propose stochastic rank1 bandits, a class of online learning problem...
read it

Online Influence Maximization under Independent Cascade Model with SemiBandit Feedback
We study the stochastic online problem of learning to influence in a soc...
read it

Cascading Bandits for LargeScale Recommendation Problems
Most recommender systems recommend a list of items. The user examines th...
read it

DCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user exa...
read it

Graphical Model Sketch
Structured highcardinality data arises in many domains, and poses a maj...
read it

Cascading Bandits: Learning to Rank in the Cascade Model
A search engine usually outputs a list of K web pages. The user examines...
read it

DUM: DiversityWeighted Utility Maximization for Recommendations
The need for diversification of recommendation lists manifests in a numb...
read it

Tight Regret Bounds for Stochastic Combinatorial SemiBandits
A stochastic combinatorial semibandit is an online learning problem whe...
read it

Efficient Learning in LargeScale Combinatorial SemiBandits
A stochastic combinatorial semibandit is an online learning problem whe...
read it

Learning to Act Greedily: Polymatroid SemiBandits
Many important optimization problems, such as the minimum spanning tree ...
read it

Matroid Bandits: Fast Combinatorial Optimization with Learning
A matroid is a notion of independence in combinatorial optimization whic...
read it

Leveraging Side Observations in Stochastic Bandits
This paper considers stochastic bandits with side observations, a model ...
read it

Solving Factored MDPs with Continuous and Discrete Variables
Although many realworld stochastic planning problems are more naturally...
read it

Partitioned Linear Programming Approximations for MDPs
Approximate linear programming (ALP) is an efficient approach to solving...
read it

Automatic Tuning of Interactive Perception Applications
Interactive applications incorporating highdata rate sensing and comput...
read it

Online SemiSupervised Learning on Quantized Graphs
In this paper, we tackle the problem of online semisupervised learning ...
read it