
A Fully ProblemDependent Regret Lower Bound for FiniteHorizon MDPs
We derive a novel asymptotic problemdependent lowerbound for regret mi...
A Unified Framework for Conservative Exploration
We study bandits and reinforcement learning (RL) subject to a conservati...
Stochastic Shortest Path: Minimax, ParameterFree and Towards HorizonFree Regret
We study the problem of learning in the stochastic shortest path (SSP) s...
Leveraging Good Representations in Linear Contextual Bandits
The linear contextual bandit literature is mostly focused on the design ...
Homomorphically Encrypted Linear Contextual Bandit
Contextual bandit is a general framework for online learning in sequenti...
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
We investigate the exploration of an unknown environment when no reward ...
An Asymptotically Optimal PrimalDual Incremental Algorithm for Contextual Linear Bandits
In the contextual linear bandit setting, algorithms built on the optimis...
Local Differentially Private Regret Minimization in Reinforcement Learning
Reinforcement learning algorithms are widely used in domains where it is...
A Provably Efficient Sample Collection Strategy for Reinforcement Learning
A common assumption in reinforcement learning (RL) is to have access to ...
Improved Analysis of UCRL2 with Empirical Bernstein Inequality
We consider the problem of explorationexploitation in communicating Mar...
A KernelBased Approach to NonStationary Reinforcement Learning in Metric Spaces
In this work, we propose KeRNS: an algorithm for episodic reinforcement ...
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization
We study the problem of learning explorationexploitation strategies tha...
Regret Bounds for KernelBased Reinforcement Learning
We consider the explorationexploitation dilemma in finitehorizon reinf...
Active Model Estimation in Markov Decision Processes
We study the problem of efficient exploration in order to learn an accur...
ExplorationExploitation in Constrained MDPs
In many sequential decisionmaking problems, the goal is to optimize a u...
Adversarial Attacks on Linear Contextual Bandits
Contextual bandit algorithms are applied in a wide range of domains, fro...
Improved Algorithms for Conservative Exploration in Bandits
In many fields such as digital marketing, healthcare, finance, and robot...
Conservative Exploration in Reinforcement Learning
While learning in an unknown Markov Decision Process (MDP), an agent sho...
Concentration Inequalities for Multinoulli Random Variables
We investigate concentration inequalities for Dirichlet and Multinomial ...
Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning
In this work, we present an alternative approach to making an agent comp...
NoRegret Exploration in GoalOriented Reinforcement Learning
Many popular reinforcement learning problems (e.g., navigation in a maze...
Frequentist Regret Bounds for Randomized LeastSquares Value Iteration
We consider the explorationexploitation dilemma in finitehorizon reinf...
Smoothing Policies and Safe Policy Gradients
Policy gradient algorithms are among the best candidates for the much an...
Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
We introduce and analyse two algorithms for explorationexploitation in ...
Near Optimal ExplorationExploitation in NonCommunicating Markov Decision Processes
While designing the state space of an MDP, it is common to include state...
Stochastic VarianceReduced Policy Gradient
In this paper, we propose a novel reinforcement learning algorithm cons...
Importance Weighted Transfer of Samples in Reinforcement Learning
We consider the transfer of experience samples (i.e., tuples < s, a, s',...
Efficient BiasSpanConstrained ExplorationExploitation in Reinforcement Learning
We introduce SCAL, an algorithm designed to perform efficient exploratio...
CostSensitive Approach to Batch Size Adaptation for Gradient Descent
In this paper, we propose a novel approach to automatically determine th...
Multiobjective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material
This document contains supplementary material for the paper "Multiobjec...
