
MetaThompson Sampling
Efficient exploration in multiarmed bandits is a fundamental online lea...
NonStationary Latent Bandits
Users of recommender systems often behave in a nonstationary fashion, d...
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
We propose a novel framework for structured bandits, which we call an in...
Latent Bandits Revisited
A latent bandit problem is one in which the learning agent knows the arm...
PiecewiseStationary OffPolicy Optimization
Offpolicy learning is a framework for evaluating and optimizing policie...
Differentiable MetaLearning in Contextual Bandits
We study a contextual bandit setting where the learning agent has access...
Sample Efficient GraphBased Optimization with Noisy Observations
We study sample complexity of optimizing "hillclimbing friendly" functi...
Differentiable Bandit Exploration
We learn bandit policies that maximize the average reward over bandit in...
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
We propose RandUCB, a bandit strategy that uses theoretically derived co...
Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
Waterfall Bandits: Learning to Sell Ads Online
A popular approach to selling online advertising is by a waterfall, wher...
Empirical Bayes Regret Minimization
The prevalent approach to bandit algorithm design is to have a lowregre...
PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
Online Diverse Learning to Rank from PartialClick Feedback
Learning to rank is an important problem in machine learning and recomme...
BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to rerank, where users provide ...
TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decisionmaking problem where in...
Conservative Exploration using Interleaving
In many practical problems, a learning agent may want to learn the best ...
New Insights into Bootstrapping for Bandits
We investigate the use of bootstrapping in the bandit setting. We first ...
Offline Evaluation of Ranking Policies with Click Models
Many web systems rank and present a list of items to users, from recomme...
Nearly Optimal Adaptive Procedure for PiecewiseStationary Bandit: a ChangePoint Detection Approach
Multiarmed bandit (MAB) is a class of online learning problems where a ...
Stochastic LowRank Bandits
Many problems in computer vision and recommender systems involve lowran...
SpectralFPL: Online Spectral Learning for Single Topic Models
This paper studies how to efficiently learn an optimal latent variable m...
Bernoulli Rank1 Bandits for Click Feedback
The probability that a user will click a search result depends both on i...
Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
Stochastic Rank1 Bandits
We propose stochastic rank1 bandits, a class of online learning problem...
Online Influence Maximization under Independent Cascade Model with SemiBandit Feedback
We study the stochastic online problem of learning to influence in a soc...
Cascading Bandits for LargeScale Recommendation Problems
Most recommender systems recommend a list of items. The user examines th...
DCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user exa...
Graphical Model Sketch
Structured highcardinality data arises in many domains, and poses a maj...
Cascading Bandits: Learning to Rank in the Cascade Model
A search engine usually outputs a list of K web pages. The user examines...
DUM: DiversityWeighted Utility Maximization for Recommendations
The need for diversification of recommendation lists manifests in a numb...
Tight Regret Bounds for Stochastic Combinatorial SemiBandits
A stochastic combinatorial semibandit is an online learning problem whe...
Efficient Learning in LargeScale Combinatorial SemiBandits
A stochastic combinatorial semibandit is an online learning problem whe...
Learning to Act Greedily: Polymatroid SemiBandits
Many important optimization problems, such as the minimum spanning tree ...
Matroid Bandits: Fast Combinatorial Optimization with Learning
A matroid is a notion of independence in combinatorial optimization whic...
Leveraging Side Observations in Stochastic Bandits
This paper considers stochastic bandits with side observations, a model ...
Solving Factored MDPs with Continuous and Discrete Variables
Although many realworld stochastic planning problems are more naturally...
Partitioned Linear Programming Approximations for MDPs
Approximate linear programming (ALP) is an efficient approach to solving...
Automatic Tuning of Interactive Perception Applications
Interactive applications incorporating highdata rate sensing and comput...
Online SemiSupervised Learning on Quantized Graphs
In this paper, we tackle the problem of online semisupervised learning ...
