
-
Meta-Thompson Sampling
Efficient exploration in multi-armed bandits is a fundamental online lea...
read it
-
Non-Stationary Latent Bandits
Users of recommender systems often behave in a non-stationary fashion, d...
read it
-
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
We propose a novel framework for structured bandits, which we call an in...
read it
-
Latent Bandits Revisited
A latent bandit problem is one in which the learning agent knows the arm...
read it
-
Piecewise-Stationary Off-Policy Optimization
Off-policy learning is a framework for evaluating and optimizing policie...
read it
-
Differentiable Meta-Learning in Contextual Bandits
We study a contextual bandit setting where the learning agent has access...
read it
-
Sample Efficient Graph-Based Optimization with Noisy Observations
We study sample complexity of optimizing "hill-climbing friendly" functi...
read it
-
Differentiable Bandit Exploration
We learn bandit policies that maximize the average reward over bandit in...
read it
-
Old Dog Learns New Tricks: Randomized UCB for Bandit Problems
We propose RandUCB, a bandit strategy that uses theoretically derived co...
read it
-
Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLM-T...
read it
-
Waterfall Bandits: Learning to Sell Ads Online
A popular approach to selling online advertising is by a waterfall, wher...
read it
-
Empirical Bayes Regret Minimization
The prevalent approach to bandit algorithm design is to have a low-regre...
read it
-
Perturbed-History Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it
-
Perturbed-History Exploration in Stochastic Multi-Armed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it
-
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
We propose a multi-armed bandit algorithm that explores based on randomi...
read it
-
Online Diverse Learning to Rank from Partial-Click Feedback
Learning to rank is an important problem in machine learning and recomme...
read it
-
BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to re-rank, where users provide ...
read it
-
TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decision-making problem where in...
read it
-
Conservative Exploration using Interleaving
In many practical problems, a learning agent may want to learn the best ...
read it
-
New Insights into Bootstrapping for Bandits
We investigate the use of bootstrapping in the bandit setting. We first ...
read it
-
Offline Evaluation of Ranking Policies with Click Models
Many web systems rank and present a list of items to users, from recomme...
read it
-
Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach
Multi-armed bandit (MAB) is a class of online learning problems where a ...
read it
-
Stochastic Low-Rank Bandits
Many problems in computer vision and recommender systems involve low-ran...
read it
-
SpectralFPL: Online Spectral Learning for Single Topic Models
This paper studies how to efficiently learn an optimal latent variable m...
read it
-
Bernoulli Rank-1 Bandits for Click Feedback
The probability that a user will click a search result depends both on i...
read it
-
Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
read it
-
Stochastic Rank-1 Bandits
We propose stochastic rank-1 bandits, a class of online learning problem...
read it
-
Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback
We study the stochastic online problem of learning to influence in a soc...
read it
-
Cascading Bandits for Large-Scale Recommendation Problems
Most recommender systems recommend a list of items. The user examines th...
read it
-
DCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user exa...
read it
-
Graphical Model Sketch
Structured high-cardinality data arises in many domains, and poses a maj...
read it
-
Cascading Bandits: Learning to Rank in the Cascade Model
A search engine usually outputs a list of K web pages. The user examines...
read it
-
DUM: Diversity-Weighted Utility Maximization for Recommendations
The need for diversification of recommendation lists manifests in a numb...
read it
-
Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits
A stochastic combinatorial semi-bandit is an online learning problem whe...
read it
-
Efficient Learning in Large-Scale Combinatorial Semi-Bandits
A stochastic combinatorial semi-bandit is an online learning problem whe...
read it
-
Learning to Act Greedily: Polymatroid Semi-Bandits
Many important optimization problems, such as the minimum spanning tree ...
read it
-
Matroid Bandits: Fast Combinatorial Optimization with Learning
A matroid is a notion of independence in combinatorial optimization whic...
read it
-
Leveraging Side Observations in Stochastic Bandits
This paper considers stochastic bandits with side observations, a model ...
read it
-
Solving Factored MDPs with Continuous and Discrete Variables
Although many real-world stochastic planning problems are more naturally...
read it
-
Partitioned Linear Programming Approximations for MDPs
Approximate linear programming (ALP) is an efficient approach to solving...
read it
-
Automatic Tuning of Interactive Perception Applications
Interactive applications incorporating high-data rate sensing and comput...
read it
-
Online Semi-Supervised Learning on Quantized Graphs
In this paper, we tackle the problem of online semi-supervised learning ...
read it