Branislav Kveton

research

∙ 06/15/2023

Logarithmic Bayes Regret Bounds

We derive the first finite-time logarithmic regret bounds for Bayesian b...

0 Alexia Atsidakou, et al. ∙

research

∙ 06/13/2023

Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

We study the problem of best-arm identification (BAI) in the fixed-budge...

0 Anusha Lalitha, et al. ∙

research

∙ 02/03/2023

Multiplier Bootstrap-based Exploration

Despite the great interest in the bandit problem, designing efficient al...

0 Runzhe Wan, et al. ∙

research

∙ 02/01/2023

Selective Uncertainty Propagation in Offline RL

We study the finite-horizon offline reinforcement learning (RL) problem....

0 Sanath Kumar Krishnamurthy, et al. ∙

research

∙ 01/12/2023

Thompson Sampling with Diffusion Generative Prior

In this work, we initiate the idea of using denoising diffusion models t...

0 Yu-Guan Hsieh, et al. ∙

research

∙ 12/09/2022

Multi-Task Off-Policy Learning from Bandit Feedback

Many practical applications, such as recommender systems and learning to...

0 Joey Hong, et al. ∙

research

∙ 11/15/2022

Bayesian Fixed-Budget Best-Arm Identification

Fixed-budget best-arm identification (BAI) is a bandit problem where the...

0 Alexia Atsidakou, et al. ∙

research

∙ 09/27/2022

From Ranked Lists to Carousels: A Carousel Click Model

Carousel-based recommendation interfaces allow users to explore recommen...

0 Behnam Rahdari, et al. ∙

research

∙ 06/08/2022

Uplifting Bandits

We introduce a multi-armed bandit model where the reward is a sum of mul...

0 Yu-Guan Hsieh, et al. ∙

research

∙ 06/06/2022

Pessimistic Off-Policy Optimization for Learning to Rank

Off-policy learning is a framework for optimizing policies without deplo...

0 Matej Cief, et al. ∙

research

∙ 05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...

0 Imad Aouali, et al. ∙

research

∙ 02/26/2022

Safe Exploration for Efficient Policy Evaluation and Comparison

High-quality data plays a central role in ensuring the accuracy of polic...

0 Runzhe Wan, et al. ∙

research

∙ 02/25/2022

Meta-Learning for Simple Regret Minimization

We develop a meta-learning framework for simple regret minimization in b...

0 MohammadJavad Azizi, et al. ∙

research

∙ 02/03/2022

Deep Hierarchy in Bandits

Mean rewards of actions are often correlated. The form of these correlat...

0 Joey Hong, et al. ∙

research

∙ 01/24/2022

IMO^3: Interactive Multi-Objective Off-Policy Optimization

Most real-world optimization problems have multiple objectives. A system...

0 Nan Wang, et al. ∙

research

∙ 11/12/2021

Hierarchical Bayesian Bandits

Meta-, multi-task, and federated learning can be all viewed as solving s...

0 Joey Hong, et al. ∙

research

∙ 11/08/2021

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...

0 Ruihao Zhu, et al. ∙

research

∙ 09/16/2021

Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

Cloud networks are difficult to monitor because they grow rapidly and th...

0 Muhammad Jehangir Amjad, et al. ∙

research

∙ 07/13/2021

No Regrets for Learning the Prior in Bandits

We propose AdaTS, a Thompson sampling algorithm that adapts sequentially...

0 Soumya Basu, et al. ∙

research

∙ 06/10/2021

Thompson Sampling with a Mixture Prior

We study Thompson sampling (TS) in online decision-making problems where...

0 Joey Hong, et al. ∙

research

∙ 06/09/2021

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

We study the problem of best-arm identification (BAI) in contextual band...

0 MohammadJavad Azizi, et al. ∙

research

∙ 03/07/2021

CORe: Capitalizing On Rewards in Bandit Exploration

We propose a bandit algorithm that explores purely by randomizing its pa...

0 Nan Wang, et al. ∙

research

∙ 02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...

0 Branislav Kveton, et al. ∙

research

∙ 12/01/2020

Non-Stationary Latent Bandits

Users of recommender systems often behave in a non-stationary fashion, d...

0 Joey Hong, et al. ∙

research

∙ 07/09/2020

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an in...

0 Tong Yu, et al. ∙

research

∙ 06/15/2020

Latent Bandits Revisited

A latent bandit problem is one in which the learning agent knows the arm...

0 Joey Hong, et al. ∙

research

∙ 06/15/2020

Piecewise-Stationary Off-Policy Optimization

Off-policy learning is a framework for evaluating and optimizing policie...

0 Joey Hong, et al. ∙

research

∙ 06/09/2020

Differentiable Meta-Learning in Contextual Bandits

We study a contextual bandit setting where the learning agent has access...

0 Branislav Kveton, et al. ∙

research

∙ 06/04/2020

Sample Efficient Graph-Based Optimization with Noisy Observations

We study sample complexity of optimizing "hill-climbing friendly" functi...

10 Tan Nguyen, et al. ∙

research

∙ 02/17/2020

Differentiable Bandit Exploration

We learn bandit policies that maximize the average reward over bandit in...

22 Craig Boutilier, et al. ∙

research

∙ 10/11/2019

Old Dog Learns New Tricks: Randomized UCB for Bandit Problems

We propose RandUCB, a bandit strategy that uses theoretically derived co...

0 Sharan Vaswani, et al. ∙

research

∙ 06/21/2019

Randomized Exploration in Generalized Linear Bandits

We study two randomized algorithms for generalized linear bandits, GLM-T...

2 Branislav Kveton, et al. ∙

research

∙ 04/20/2019

Waterfall Bandits: Learning to Sell Ads Online

A popular approach to selling online advertising is by a waterfall, wher...

6 Branislav Kveton, et al. ∙

research

∙ 04/04/2019

Empirical Bayes Regret Minimization

The prevalent approach to bandit algorithm design is to have a low-regre...

6 Chih-Wei Hsu, et al. ∙

research

∙ 03/21/2019

Perturbed-History Exploration in Stochastic Linear Bandits

We propose a new online algorithm for minimizing the cumulative regret i...

10 Branislav Kveton, et al. ∙

research

∙ 02/26/2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

We propose an online algorithm for cumulative regret minimization in a s...

4 Branislav Kveton, et al. ∙

research

∙ 11/13/2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

We propose a multi-armed bandit algorithm that explores based on randomi...

6 Branislav Kveton, et al. ∙

research

∙ 11/01/2018

Online Diverse Learning to Rank from Partial-Click Feedback

Learning to rank is an important problem in machine learning and recomme...

8 Prakhar Gupta, et al. ∙

research

∙ 06/15/2018

BubbleRank: Safe Online Learning to Rerank

We study the problem of online learning to re-rank, where users provide ...

2 Branislav Kveton, et al. ∙

research

∙ 06/06/2018

TopRank: A practical algorithm for online stochastic ranking

Online learning to rank is a sequential decision-making problem where in...

2 Tor Lattimore, et al. ∙

research

∙ 06/03/2018

Conservative Exploration using Interleaving

In many practical problems, a learning agent may want to learn the best ...

2 Sumeet Katariya, et al. ∙

research

∙ 05/24/2018

New Insights into Bootstrapping for Bandits

We investigate the use of bootstrapping in the bandit setting. We first ...

0 Sharan Vaswani, et al. ∙

research

∙ 04/27/2018

Offline Evaluation of Ranking Policies with Click Models

Many web systems rank and present a list of items to users, from recomme...

0 Yasin Abbasi-Yadkori, et al. ∙

research

∙ 02/11/2018

Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

Multi-armed bandit (MAB) is a class of online learning problems where a ...

0 Yang Cao, et al. ∙

research

∙ 12/13/2017

Stochastic Low-Rank Bandits

Many problems in computer vision and recommender systems involve low-ran...

0 Branislav Kveton, et al. ∙

research

∙ 09/21/2017

SpectralFPL: Online Spectral Learning for Single Topic Models

This paper studies how to efficiently learn an optimal latent variable m...

0 Tong Yu, et al. ∙

research

∙ 03/19/2017

Bernoulli Rank-1 Bandits for Click Feedback

The probability that a user will click a search result depends both on i...

0 Sumeet Katariya, et al. ∙

research

∙ 03/07/2017

Online Learning to Rank in Stochastic Click Models

Online learning to rank is a core problem in information retrieval and m...

0 Masrour Zoghi, et al. ∙

research

∙ 08/10/2016

Stochastic Rank-1 Bandits

We propose stochastic rank-1 bandits, a class of online learning problem...

0 Sumeet Katariya, et al. ∙

research

∙ 05/21/2016

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

We study the stochastic online problem of learning to influence in a soc...

0 Zheng Wen, et al. ∙

Branislav Kveton

Featured Co-authors

Sign in with Google

Consider DeepAI Pro