Csaba Szepesvari

research

∙ 07/25/2023

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation

Theoretical guarantees in reinforcement learning (RL) are known to suffe...

0 Philip Amortila, et al. ∙

research

∙ 06/22/2023

Context-lumpable stochastic bandits

We consider a contextual bandit problem with S contexts and A actions. I...

0 Chung-Wei Lee, et al. ∙

research

∙ 05/22/2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...

0 Toshinori Kitamura, et al. ∙

research

∙ 05/18/2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

While policy optimization algorithms have played an important role in re...

0 Qinghua Liu, et al. ∙

research

∙ 02/25/2023

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

A fundamental question in reinforcement learning theory is: suppose the ...

0 Daniel Kane, et al. ∙

research

∙ 02/08/2023

Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning

A practical challenge in reinforcement learning are combinatorial action...

0 Volodymyr Tkachuk, et al. ∙

research

∙ 01/29/2023

Sample Efficient Deep Reinforcement Learning via Local Planning

The focus of this work is sample-efficient deep reinforcement learning (...

0 Dong Yin, et al. ∙

research

∙ 01/16/2023

The Role of Baselines in Policy Gradient Optimization

We study the effect of baselines in on-policy stochastic policy gradient...

12 Jincheng Mei, et al. ∙

research

∙ 12/28/2022

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

We explore the ability of overparameterized shallow ReLU neural networks...

0 Ilja Kuzborskij, et al. ∙

research

∙ 10/30/2022

Revisiting Simple Regret Minimization in Multi-Armed Bandits

Simple regret is a natural and parameter-free performance criterion for ...

0 Yao Zhao, et al. ∙

research

∙ 10/27/2022

Confident Approximate Policy Iteration for Efficient Local Planning in q^π-realizable MDPs

We consider approximate dynamic programming in γ-discounted Markov decis...

0 Gellért Weisz, et al. ∙

research

∙ 09/29/2022

Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

This paper introduces a simple efficient learning algorithms for general...

5 Qinghua Liu, et al. ∙

research

∙ 06/13/2022

Near-Optimal Sample Complexity Bounds for Constrained MDPs

In contrast to the advances in characterizing the sample complexity for ...

4 Sharan Vaswani, et al. ∙

research

∙ 06/05/2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Directed Evolution (DE), a landmark wet-lab method originated in 1960s, ...

0 Hui Yuan, et al. ∙

research

∙ 06/02/2022

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

This paper considers the challenging tasks of Multi-Agent Reinforcement ...

0 Qinghua Liu, et al. ∙

research

∙ 05/27/2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

In this work, we consider and analyze the sample complexity of model-fre...

6 Tadashi Kozuno, et al. ∙

research

∙ 04/19/2022

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to ma...

0 Qinghua Liu, et al. ∙

research

∙ 04/11/2022

Towards Painless Policy Optimization for Constrained MDPs

We study policy optimization in an infinite horizon, γ-discounted constr...

2 Arushi Jain, et al. ∙

research

∙ 11/22/2021

A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

Representation learning lies at the heart of the empirical success of de...

0 Tongzheng Ren, et al. ∙

research

∙ 10/29/2021

Understanding the Effect of Stochasticity in Policy Optimization

We study the effect of stochasticity in on-policy policy optimization, a...

0 Jincheng Mei, et al. ∙

research

∙ 10/05/2021

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

We consider the minimax query complexity of online planning with a gener...

0 Gellért Weisz, et al. ∙

research

∙ 08/12/2021

Efficient Local Planning with Linear Function Approximation

We study query and computationally efficient planning algorithms with li...

0 Dong Yin, et al. ∙

research

∙ 07/27/2021

On the Role of Optimization in Double Descent: A Least Squares Study

Empirically it has been observed that the performance of deep neural net...

0 Ilja Kuzborskij, et al. ∙

research

∙ 07/13/2021

No Regrets for Learning the Prior in Bandits

We propose AdaTS, a Thompson sampling algorithm that adapts sequentially...

0 Soumya Basu, et al. ∙

research

∙ 07/12/2021

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

We explore the ability of overparameterized shallow neural networks to l...

0 Ilja Kuzborskij, et al. ∙

research

∙ 06/18/2021

On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data

We study the fundamental question of the sample complexity of learning a...

0 Chenjun Xiao, et al. ∙

research

∙ 06/15/2021

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning

Many advances that have improved the robustness and efficiency of deep r...

0 Abbas Abdolmaleki, et al. ∙

research

∙ 05/13/2021

Leveraging Non-uniformity in First-order Non-convex Optimization

Classical global convergence results for first-order methods rely on uni...

14 Jincheng Mei, et al. ∙

research

∙ 04/06/2021

On the Optimality of Batch Policy Optimization Algorithms

Batch policy optimization considers leveraging existing data for policy ...

0 Chenjun Xiao, et al. ∙

research

∙ 02/25/2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration

In this work, we study algorithms for learning in infinite-horizon undis...

0 Nevena Lazic, et al. ∙

research

∙ 02/17/2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient gives rise to a rich class of reinforcement learning (RL...

0 Junyu Zhang, et al. ∙

research

∙ 02/11/2021

Optimization Issues in KL-Constrained Approximate Policy Iteration

Many reinforcement learning algorithms can be seen as versions of approx...

0 Nevena Lazic, et al. ∙

research

∙ 02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...

0 Branislav Kveton, et al. ∙

research

∙ 02/06/2021

Bootstrapping Statistical Inference for Off-Policy Evaluation

Bootstrapping provides a flexible and effective approach for assessing t...

0 Botao Hao, et al. ∙

research

∙ 02/03/2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

We consider the problem of local planning in fixed-horizon Markov Decisi...

0 Gellért Weisz, et al. ∙

research

∙ 12/15/2020

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation ...

11 Dongruo Zhou, et al. ∙

research

∙ 11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...

0 Johannes Kirschner, et al. ∙

research

∙ 11/08/2020

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

This paper provides a statistical analysis of high-dimensional batch Rei...

0 Botao Hao, et al. ∙

research

∙ 11/08/2020

Online Sparse Reinforcement Learning

We investigate the hardness of online reinforcement learning in fixed ho...

0 Botao Hao, et al. ∙

research

∙ 10/31/2020

On Optimality of Meta-Learning in Fixed-Design Regression with Weighted Biased Regularization

We consider a fixed-design linear regression in the meta-learning model ...

0 Mikhail Konobeev, et al. ∙

research

∙ 10/23/2020

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

In this paper, we study Contextual Unsupervised Sequential Selection (US...

0 Arun Verma, et al. ∙

research

∙ 10/22/2020

CoinDICE: Off-Policy Confidence Interval Estimation

We study high-confidence behavior-agnostic off-policy evaluation in rein...

0 Bo Dai, et al. ∙

research

∙ 10/03/2020

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

We consider the problem of local planning in fixed-horizon Markov Decisi...

0 Gellért Weisz, et al. ∙

research

∙ 07/25/2020

Tighter risk certificates for neural networks

This paper presents empirical studies regarding training probabilistic n...

10 Maria Perez-Ortiz, et al. ∙

research

∙ 07/13/2020

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Large-scale Markov decision processes (MDPs) require planning algorithms...

0 Roshan Shariff, et al. ∙

research

∙ 07/04/2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

In recent years, reinforcement learning (RL) systems with general goals ...

4 Junyu Zhang, et al. ∙

research

∙ 06/23/2020

PAC-Bayes Analysis Beyond the Usual Bounds

We focus on a stochastic learning model where the learner observes a fin...

15 Omar Rivasplata, et al. ∙

research

∙ 06/18/2020

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

We consider off-policy evaluation in the contextual bandit setting for t...

0 Ilja Kuzborskij, et al. ∙

research

∙ 06/09/2020

Differentiable Meta-Learning in Contextual Bandits

We study a contextual bandit setting where the learning agent has access...

0 Branislav Kveton, et al. ∙

research

∙ 06/01/2020

Model-Based Reinforcement Learning with Value-Targeted Regression

This paper studies model-based reinforcement learning (RL) for regret mi...

11 Alex Ayoub, et al. ∙

Csaba Szepesvari

Featured Co-authors

Sign in with Google

Consider DeepAI Pro