
Differentiable Bandit Exploration
We learn bandit policies that maximize the average reward over bandit in...
read it

Online Algorithm for Unsupervised Sensor Selection
In many security and healthcare systems, the detection and diagnosis sys...
read it

Provably Efficient Adaptive Approximate Policy Iteration
Modelfree reinforcement learning algorithms combined with value functio...
read it

PACBayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a fin...
read it

Gradient Descent for Sparse RankOne Matrix Completion for CrowdSourced Aggregation of Sparsely Interacting Workers
We consider worker skill estimation for the singlecoin DawidSkene crow...
read it

Learning with Good Feature Representations in Bandits and in RL with a Generative Model
The construction in the recent paper by Du et al. [2019] implies that se...
read it

ModelBased Reinforcement Learning with ValueTargeted Regression
This paper studies modelbased reinforcement learning (RL) for regret mi...
read it

Detecting Overfitting via Adversarial Examples
The repeated reuse of test sets in popular benchmark problems raises dou...
read it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it

Autonomous exploration for navigating in nonstationary CMPs
We consider a setting in which the objective is to learn to navigate in ...
read it

Online Learning to Rank with Features
We introduce a new model for online ranking in which the click probabili...
read it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
read it

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
read it

An InformationTheoretic Approach to Minimax Regret in Partial Monitoring
We prove a new minimax theorem connecting the worstcase Bayesian regret...
read it

Empirical Bayes Regret Minimization
The prevalent approach to bandit algorithm design is to have a lowregre...
read it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it

PACBayes with Backprop
We explore a method to train probabilistic neural networks by minimizing...
read it

On the Global Convergence Rates of Softmax Policy Gradient Methods
We make three contributions toward better understanding policy gradient ...
read it

TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decisionmaking problem where in...
read it

BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to rerank, where users provide ...
read it

PACBayes bounds for stable algorithms with instancedependent priors
PACBayes bounds have been proposed to get risk estimates based on a tra...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial kaction do...
read it

ExplorationEnhanced POLITEX
We study algorithms for averagecost reinforcement learning problems wit...
read it

Bandits with Delayed Anonymous Feedback
We study the bandits with delayed anonymous feedback problem, a variant ...
read it

Linear Stochastic Approximation: Constant StepSize and Iterate Averaging
We consider ddimensional linear stochastic approximation algorithms (LS...
read it

A Modular Analysis of Adaptive (Non)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
Recently, much work has been done on extending the scope of online learn...
read it

Structured Best Arm Identification with Fixed Confidence
We study the problem of identifying the best action among a set of possi...
read it

Mixing time estimation in reversible Markov chains from a single sample path
The spectral gap γ of a finite, ergodic, and reversible Markov chain is ...
read it

An a Priori Exponential Tail Bound for kFolds CrossValidation
We consider a priori generalization bounds developed in terms of crossv...
read it

Bernoulli Rank1 Bandits for Click Feedback
The probability that a user will click a search result depends both on i...
read it

Online Learning to Rank in Stochastic Click Models
Online learning to rank is a core problem in information retrieval and m...
read it

The End of Optimism? An Asymptotic Analysis of FiniteArmed Linear Bandits
Stochastic linear bandits are a natural and simple generalisation of fin...
read it

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles
Algorithms for bandit convex optimization and online learning often rely...
read it

Multiclass Classification Calibration Functions
In this paper we refine the process of computing calibration functions f...
read it

Chaining Bounds for Empirical Risk Minimization
This paper extends the standard chaining technique to prove excess risk ...
read it

Stochastic Rank1 Bandits
We propose stochastic rank1 bandits, a class of online learning problem...
read it

On Minimax Optimal Offline Policy Evaluation
This paper studies the offpolicy evaluation problem, where one aims to ...
read it

Adaptive Monte Carlo via Bandit Allocation
We consider the problem of sequentially choosing between a set of unbias...
read it

Policy Error Bounds for ModelBased Reinforcement Learning with Factored Linear Models
In this paper we study a modelbased approach to calculating approximate...
read it

Conservative Bandits
We study a novel multiarmed bandit problem that models the challenge fa...
read it

DCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user exa...
read it

Online Learning with Gaussian Payoffs and Side Observations
We consider a sequential learning problem with Gaussian payoffs and side...
read it

Fast CrossValidation for Incremental Learning
Crossvalidation (CV) is one of the main tools for performance estimatio...
read it

Cascading Bandits: Learning to Rank in the Cascade Model
A search engine usually outputs a list of K web pages. The user examines...
read it

Tight Regret Bounds for Stochastic Combinatorial SemiBandits
A stochastic combinatorial semibandit is an online learning problem whe...
read it

DynaStyle Planning with Linear Function Approximation and Prioritized Sweeping
We consider the problem of efficiently learning optimal control policies...
read it

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions
In this paper, we consider planning in stochastic shortest path (SSP) pr...
read it

Online Least Squares Estimation with SelfNormalized Processes: An Application to Bandit Problems
The analysis of online least squares estimation is at the heart of many ...
read it

Online Learning under Delayed Feedback
Online learning with delayed feedback has received increasing attention ...
read it
Csaba Szepesvari
is this you? claim profile
Research Scientist at DeepMind, Professor at University of Alberta, Principal Investigator at Alberta Machine Intelligence Institute (Amii)