
Online Algorithm for Unsupervised Sensor Selection
In many security and healthcare systems, the detection and diagnosis sys...
01/15/2019 ∙ by Arun Verma, et al. ∙ 18 ∙ shareread it

Gradient Descent for Sparse RankOne Matrix Completion for CrowdSourced Aggregation of Sparsely Interacting Workers
We consider worker skill estimation for the singlecoin DawidSkene crow...
04/25/2019 ∙ by Yao Ma, et al. ∙ 12 ∙ shareread it

Learning with Good Feature Representations in Bandits and in RL with a Generative Model
The construction in the recent paper by Du et al. [2019] implies that se...
11/18/2019 ∙ by Tor Lattimore, et al. ∙ 11 ∙ shareread it

Detecting Overfitting via Adversarial Examples
The repeated reuse of test sets in popular benchmark problems raises dou...
03/06/2019 ∙ by Roman Werpachowski, et al. ∙ 10 ∙ shareread it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
03/21/2019 ∙ by Branislav Kveton, et al. ∙ 10 ∙ shareread it

Autonomous exploration for navigating in nonstationary CMPs
We consider a setting in which the objective is to learn to navigate in ...
10/18/2019 ∙ by Pratik Gajane, et al. ∙ 10 ∙ shareread it

Online Learning to Rank with Features
We introduce a new model for online ranking in which the click probabili...
10/05/2018 ∙ by Shuai Li, et al. ∙ 8 ∙ shareread it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
11/13/2018 ∙ by Branislav Kveton, et al. ∙ 6 ∙ shareread it

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
12/04/2018 ∙ by Jonathan Uesato, et al. ∙ 6 ∙ shareread it

An InformationTheoretic Approach to Minimax Regret in Partial Monitoring
We prove a new minimax theorem connecting the worstcase Bayesian regret...
02/01/2019 ∙ by Tor Lattimore, et al. ∙ 6 ∙ shareread it

Empirical Bayes Regret Minimization
The prevalent approach to bandit algorithm design is to have a lowregre...
04/04/2019 ∙ by ChihWei Hsu, et al. ∙ 6 ∙ shareread it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
02/26/2019 ∙ by Branislav Kveton, et al. ∙ 4 ∙ shareread it

PACBayes with Backprop
We explore a method to train probabilistic neural networks by minimizing...
08/19/2019 ∙ by Omar Rivasplata, et al. ∙ 4 ∙ shareread it

TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decisionmaking problem where in...
06/06/2018 ∙ by Tor Lattimore, et al. ∙ 2 ∙ shareread it

BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to rerank, where users provide ...
06/15/2018 ∙ by Branislav Kveton, et al. ∙ 2 ∙ shareread it

PACBayes bounds for stable algorithms with instancedependent priors
PACBayes bounds have been proposed to get risk estimates based on a tra...
06/18/2018 ∙ by Omar Rivasplata, et al. ∙ 2 ∙ shareread it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
06/21/2019 ∙ by Branislav Kveton, et al. ∙ 2 ∙ shareread it

Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial kaction do...
07/12/2019 ∙ by Tor Lattimore, et al. ∙ 1 ∙ shareread it

ExplorationEnhanced POLITEX
We study algorithms for averagecost reinforcement learning problems wit...
08/27/2019 ∙ by Yasin AbbasiYadkori, et al. ∙ 1 ∙ shareread it

Bandits with Delayed Anonymous Feedback
We study the bandits with delayed anonymous feedback problem, a variant ...
09/20/2017 ∙ by Ciara PikeBurke, et al. ∙ 0 ∙ shareread it

Linear Stochastic Approximation: Constant StepSize and Iterate Averaging
We consider ddimensional linear stochastic approximation algorithms (LS...
09/12/2017 ∙ by Chandrashekar Lakshminarayanan, et al. ∙ 0 ∙ shareread it

A Modular Analysis of Adaptive (Non)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
Recently, much work has been done on extending the scope of online learn...
09/08/2017 ∙ by Pooria Joulani, et al. ∙ 0 ∙ shareread it

Structured Best Arm Identification with Fixed Confidence
We study the problem of identifying the best action among a set of possi...
06/16/2017 ∙ by Ruitong Huang, et al. ∙ 0 ∙ shareread it

Mixing time estimation in reversible Markov chains from a single sample path
The spectral gap γ of a finite, ergodic, and reversible Markov chain is ...
08/24/2017 ∙ by Daniel Hsu, et al. ∙ 0 ∙ shareread it

An a Priori Exponential Tail Bound for kFolds CrossValidation
We consider a priori generalization bounds developed in terms of crossv...
06/19/2017 ∙ by Karim AbouMoustafa, et al. ∙ 0 ∙ shareread it

Bernoulli Rank1 Bandits for Click Feedback
The probability that a user will click a search result depends both on i...
03/19/2017 ∙ by Sumeet Katariya, et al. ∙ 0 ∙ shareread it

Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
03/07/2017 ∙ by Masrour Zoghi, et al. ∙ 0 ∙ shareread it

The End of Optimism? An Asymptotic Analysis of FiniteArmed Linear Bandits
Stochastic linear bandits are a natural and simple generalisation of fin...
10/14/2016 ∙ by Tor Lattimore, et al. ∙ 0 ∙ shareread it

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles
Algorithms for bandit convex optimization and online learning often rely...
09/22/2016 ∙ by Xiaowei Hu, et al. ∙ 0 ∙ shareread it

Multiclass Classification Calibration Functions
In this paper we refine the process of computing calibration functions f...
09/20/2016 ∙ by Bernardo Ávila Pires, et al. ∙ 0 ∙ shareread it

Chaining Bounds for Empirical Risk Minimization
This paper extends the standard chaining technique to prove excess risk ...
09/07/2016 ∙ by Gábor Balázs, et al. ∙ 0 ∙ shareread it

Stochastic Rank1 Bandits
We propose stochastic rank1 bandits, a class of online learning problem...
08/10/2016 ∙ by Sumeet Katariya, et al. ∙ 0 ∙ shareread it

On Minimax Optimal Offline Policy Evaluation
This paper studies the offpolicy evaluation problem, where one aims to ...
09/12/2014 ∙ by Lihong Li, et al. ∙ 0 ∙ shareread it

Adaptive Monte Carlo via Bandit Allocation
We consider the problem of sequentially choosing between a set of unbias...
05/13/2014 ∙ by James Neufeld, et al. ∙ 0 ∙ shareread it

Policy Error Bounds for ModelBased Reinforcement Learning with Factored Linear Models
In this paper we study a modelbased approach to calculating approximate...
02/19/2016 ∙ by Bernardo Ávila Pires, et al. ∙ 0 ∙ shareread it

Conservative Bandits
We study a novel multiarmed bandit problem that models the challenge fa...
02/13/2016 ∙ by Yifan Wu, et al. ∙ 0 ∙ shareread it

DCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user exa...
02/09/2016 ∙ by Sumeet Katariya, et al. ∙ 0 ∙ shareread it

Online Learning with Gaussian Payoffs and Side Observations
We consider a sequential learning problem with Gaussian payoffs and side...
10/27/2015 ∙ by Yifan Wu, et al. ∙ 0 ∙ shareread it

Fast CrossValidation for Incremental Learning
Crossvalidation (CV) is one of the main tools for performance estimatio...
06/30/2015 ∙ by Pooria Joulani, et al. ∙ 0 ∙ shareread it

Cascading Bandits: Learning to Rank in the Cascade Model
A search engine usually outputs a list of K web pages. The user examines...
02/10/2015 ∙ by Branislav Kveton, et al. ∙ 0 ∙ shareread it

Tight Regret Bounds for Stochastic Combinatorial SemiBandits
A stochastic combinatorial semibandit is an online learning problem whe...
10/03/2014 ∙ by Branislav Kveton, et al. ∙ 0 ∙ shareread it

DynaStyle Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies...
06/13/2012 ∙ by Richard S. Sutton, et al. ∙ 0 ∙ shareread it

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions
In this paper, we consider planning in stochastic shortest path (SSP) pr...
06/13/2012 ∙ by Alejandro Isaza, et al. ∙ 0 ∙ shareread it

Online Least Squares Estimation with SelfNormalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many ...
02/14/2011 ∙ by Yasin AbbasiYadkori, et al. ∙ 0 ∙ shareread it

Online Learning under Delayed Feedback
Online learning with delayed feedback has received increasing attention ...
06/04/2013 ∙ by Pooria Joulani, et al. ∙ 0 ∙ shareread it

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
We study the problem of learning Markov decision processes with finite s...
03/12/2013 ∙ by Yasin AbbasiYadkori, et al. ∙ 0 ∙ shareread it

Estimation of Rényi Entropy and Mutual Information Based on Generalized NearestNeighbor Graphs
We present simple and computationally efficient nonparametric estimators...
03/09/2010 ∙ by David Pal, et al. ∙ 0 ∙ shareread it

Statistical Linear Estimation with Penalized Estimators: an Application to Reinforcement Learning
Motivated by value function estimation in reinforcement learning, we stu...
06/27/2012 ∙ by Bernardo Ávila Pires, et al. ∙ 0 ∙ shareread it

An Adaptive Algorithm for Finite Stochastic Partial Monitoring
We present a new anytime algorithm that achieves nearoptimal regret for...
06/27/2012 ∙ by Gábor Bartók, et al. ∙ 0 ∙ shareread it

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
In this paper we propose a novel gradient algorithm to learn a policy fr...
06/20/2012 ∙ by Gergely Neu, et al. ∙ 0 ∙ shareread it
Csaba Szepesvari
is this you? claim profile
Research Scientist at DeepMind, Professor at University of Alberta, Principal Investigator at Alberta Machine Intelligence Institute (Amii)