
On the Sample Complexity of Batch Reinforcement Learning with PolicyInduced Data
We study the fundamental question of the sample complexity of learning a...
read it

On Multiobjective Policy Optimization as a Tool for Reinforcement Learning
Many advances that have improved the robustness and efficiency of deep r...
read it

Leveraging Nonuniformity in Firstorder Nonconvex Optimization
Classical global convergence results for firstorder methods rely on uni...
read it

On the Optimality of Batch Policy Optimization Algorithms
Batch policy optimization considers leveraging existing data for policy ...
read it

Improved Regret Bound and Experience Replay in Regularized Policy Iteration
In this work, we study algorithms for learning in infinitehorizon undis...
read it

On the Convergence and Sample Efficiency of VarianceReduced Policy Gradient Method
Policy gradient gives rise to a rich class of reinforcement learning (RL...
read it

Optimization Issues in KLConstrained Approximate Policy Iteration
Many reinforcement learning algorithms can be seen as versions of approx...
read it

MetaThompson Sampling
Efficient exploration in multiarmed bandits is a fundamental online lea...
read it

Bootstrapping Statistical Inference for OffPolicy Evaluation
Bootstrapping provides a flexible and effective approach for assessing t...
read it

On Queryefficient Planning in MDPs under Linear Realizability of the Optimal Statevalue Function
We consider the problem of local planning in fixedhorizon Markov Decisi...
read it

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes
We study reinforcement learning (RL) with linear function approximation ...
read it

Asymptotically Optimal InformationDirected Sampling
We introduce a computationally efficient algorithm for finite stochastic...
read it

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
This paper provides a statistical analysis of highdimensional batch Rei...
read it

Online Sparse Reinforcement Learning
We investigate the hardness of online reinforcement learning in fixed ho...
read it

On Optimality of MetaLearning in FixedDesign Regression with Weighted Biased Regularization
We consider a fixeddesign linear regression in the metalearning model ...
read it

Online Algorithm for Unsupervised Sequential Selection with Contextual Information
In this paper, we study Contextual Unsupervised Sequential Selection (US...
read it

CoinDICE: OffPolicy Confidence Interval Estimation
We study highconfidence behavioragnostic offpolicy evaluation in rein...
read it

Exponential Lower Bounds for Planning in MDPs With LinearlyRealizable Optimal ActionValue Functions
We consider the problem of local planning in fixedhorizon Markov Decisi...
read it

Tighter risk certificates for neural networks
This paper presents empirical studies regarding training probabilistic n...
read it

Efficient Planning in Large MDPs with Weak Linear Function Approximation
Largescale Markov decision processes (MDPs) require planning algorithms...
read it

Variational Policy Gradient Method for Reinforcement Learning with General Utilities
In recent years, reinforcement learning (RL) systems with general goals ...
read it

PACBayes Analysis Beyond the Usual Bounds
We focus on a stochastic learning model where the learner observes a fin...
read it

Confident OffPolicy Evaluation and Selection through SelfNormalized Importance Weighting
We consider offpolicy evaluation in the contextual bandit setting for t...
read it

Differentiable MetaLearning in Contextual Bandits
We study a contextual bandit setting where the learning agent has access...
read it

ModelBased Reinforcement Learning with ValueTargeted Regression
This paper studies modelbased reinforcement learning (RL) for regret mi...
read it

On the Global Convergence Rates of Softmax Policy Gradient Methods
We make three contributions toward better understanding policy gradient ...
read it

Model Selection in Contextual Stochastic Bandit Problems
We study model selection in stochastic bandit problems. Our approach rel...
read it

Differentiable Bandit Exploration
We learn bandit policies that maximize the average reward over bandit in...
read it

Provably Efficient Adaptive Approximate Policy Iteration
Modelfree reinforcement learning algorithms combined with value functio...
read it

Learning with Good Feature Representations in Bandits and in RL with a Generative Model
The construction in the recent paper by Du et al. [2019] implies that se...
read it

Autonomous exploration for navigating in nonstationary CMPs
We consider a setting in which the objective is to learn to navigate in ...
read it

Adaptive Exploration in Linear Contextual Bandit
Contextual bandits serve as a fundamental model for many sequential deci...
read it

EfronStein PACBayesian Inequalities
We prove semiempirical concentration inequalities for random variables ...
read it

ExplorationEnhanced POLITEX
We study algorithms for averagecost reinforcement learning problems wit...
read it

PACBayes with Backprop
We explore a method to train probabilistic neural networks by minimizing...
read it

Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial kaction do...
read it

Randomized Exploration in Generalized Linear Bandits
We study two randomized algorithms for generalized linear bandits, GLMT...
read it

Gradient Descent for Sparse RankOne Matrix Completion for CrowdSourced Aggregation of Sparsely Interacting Workers
We consider worker skill estimation for the singlecoin DawidSkene crow...
read it

Empirical Bayes Regret Minimization
The prevalent approach to bandit algorithm design is to have a lowregre...
read it

PerturbedHistory Exploration in Stochastic Linear Bandits
We propose a new online algorithm for minimizing the cumulative regret i...
read it

An Exponential EfronStein Inequality for Lq Stable Learning Rules
There is accumulating evidence in the literature that stability of learn...
read it

Detecting Overfitting via Adversarial Examples
The repeated reuse of test sets in popular benchmark problems raises dou...
read it

PerturbedHistory Exploration in Stochastic MultiArmed Bandits
We propose an online algorithm for cumulative regret minimization in a s...
read it

DistributionDependent Analysis of GibbsERM Principle
GibbsERM learning is a natural idealized model of learning with stochas...
read it

An InformationTheoretic Approach to Minimax Regret in Partial Monitoring
We prove a new minimax theorem connecting the worstcase Bayesian regret...
read it

Online Algorithm for Unsupervised Sensor Selection
In many security and healthcare systems, the detection and diagnosis sys...
read it

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safet...
read it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
read it

Online Learning to Rank with Features
We introduce a new model for online ranking in which the click probabili...
read it

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration
We consider the problem of configuring generalpurpose solvers to run ef...
read it
Csaba Szepesvari
is this you? claim profile
Research Scientist at DeepMind, Professor at University of Alberta, Principal Investigator at Alberta Machine Intelligence Institute (Amii)