
A general sample complexity analysis of vanilla policy gradient
The policy gradient (PG) is one of the most popular methods for solving ...
read it

Mastering Visual Continuous Control: Improved DataAugmented Reinforcement Learning
We present DrQv2, a modelfree reinforcement learning (RL) algorithm fo...
read it

A Fully ProblemDependent Regret Lower Bound for FiniteHorizon MDPs
We derive a novel asymptotic problemdependent lowerbound for regret mi...
read it

A Unified Framework for Conservative Exploration
We study bandits and reinforcement learning (RL) subject to a conservati...
read it

Stochastic Shortest Path: Minimax, ParameterFree and Towards HorizonFree Regret
We study the problem of learning in the stochastic shortest path (SSP) s...
read it

Leveraging Good Representations in Linear Contextual Bandits
The linear contextual bandit literature is mostly focused on the design ...
read it

Reinforcement Learning with Prototypical Representations
Learning effective representations in imagebased environments is crucia...
read it

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs
We investigate the exploration of an unknown environment when no reward ...
read it

An Asymptotically Optimal PrimalDual Incremental Algorithm for Contextual Linear Bandits
In the contextual linear bandit setting, algorithms built on the optimis...
read it

Provably Efficient RewardAgnostic Navigation with Linear Value Iteration
There has been growing progress on theoretical analyses for provably eff...
read it

Efficient Optimistic Exploration in LinearQuadratic Regulators via Lagrangian Relaxation
We study the explorationexploitation dilemma in the linear quadratic re...
read it

A Provably Efficient Sample Collection Strategy for Reinforcement Learning
A common assumption in reinforcement learning (RL) is to have access to ...
read it

Improved Analysis of UCRL2 with Empirical Bernstein Inequality
We consider the problem of explorationexploitation in communicating Mar...
read it

Sketched NewtonRaphson
We propose a new globally convergent stochastic second order method. Our...
read it

A Novel ConfidenceBased Algorithm for Structured Bandits
We study finitearmed stochastic bandits where the rewards of each arm m...
read it

Metalearning with Stochastic Linear Bandits
We investigate metalearning procedures in the setting of stochastic lin...
read it

Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization
We study the problem of learning explorationexploitation strategies tha...
read it

Active Model Estimation in Markov Decision Processes
We study the problem of efficient exploration in order to learn an accur...
read it

Learning Near Optimal Policies with Low Inherent Bellman Error
We study the exploration problem with approximate linear actionvalue fu...
read it

Nearlinear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
Gaussian processes (GP) are one of the most successful frameworks to mod...
read it

Adversarial Attacks on Linear Contextual Bandits
Contextual bandit algorithms are applied in a wide range of domains, fro...
read it

Improved Algorithms for Conservative Exploration in Bandits
In many fields such as digital marketing, healthcare, finance, and robot...
read it

Conservative Exploration in Reinforcement Learning
While learning in an unknown Markov Decision Process (MDP), an agent sho...
read it

Concentration Inequalities for Multinoulli Random Variables
We investigate concentration inequalities for Dirichlet and Multinomial ...
read it

NoRegret Exploration in GoalOriented Reinforcement Learning
Many popular reinforcement learning problems (e.g., navigation in a maze...
read it

Frequentist Regret Bounds for Randomized LeastSquares Value Iteration
We consider the explorationexploitation dilemma in finitehorizon reinf...
read it

A Structured Prediction Approach for Generalization in Cooperative MultiAgent Reinforcement Learning
Effective coordination is crucial to solve multiagent collaborative (MA...
read it

Wordorder biases in deepagent emergent communication
Sequenceprocessing neural networks led to remarkable progress on many N...
read it

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
Gaussian processes (GP) are a popular Bayesian approach for the optimiza...
read it

Active Exploration in Markov Decision Processes
We introduce the active exploration problem in Markov decision processes...
read it

Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
We introduce and analyse two algorithms for explorationexploitation in ...
read it

Rotting bandits are no harder than stochastic ones
In bandits, arms' distributions are stationary. This is often violated i...
read it

Near Optimal ExplorationExploitation in NonCommunicating Markov Decision Processes
While designing the state space of an MDP, it is common to include state...
read it

Distributed Adaptive Sampling for Kernel Matrix Approximation
Most kernelbased methods, such as kernel or Gaussian process regression...
read it

Efficient BiasSpanConstrained ExplorationExploitation in Reinforcement Learning
We introduce SCAL, an algorithm designed to perform efficient exploratio...
read it

SecondOrder Kernel Online Convex Optimization with Adaptive Sketching
Kernel online convex optimization (KOCO) is a framework combining the ex...
read it

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observab...
read it

Thompson Sampling for LinearQuadratic Control Problems
We consider the explorationexploitation tradeoff in linear quadratic (L...
read it

ExplorationExploitation in MDPs with Options
While a large body of empirical results show that temporallyextended ac...
read it

Active Learning for Accurate Estimation of Linear Models
We explore the sequential decision making problem where the goal is to e...
read it

Linear Thompson Sampling Revisited
We derive an alternative proof for the regret of Thompson sampling () in...
read it

Reinforcement Learning in RichObservation MDPs using Spectral Methods
Designing effective explorationexploitation algorithms in Markov decisi...
read it

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting
We derive a new proof to show that the incremental resparsification algo...
read it

Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observab...
read it

Incremental Spectral Sparsification for LargeScale GraphBased SemiSupervised Learning
While the harmonic function solution performs well in many semisupervis...
read it

Online Stochastic Optimization under Correlated Bandit Feedback
In this paper we consider the problem of online stochastic optimization ...
read it

Sequential Transfer in Multiarmed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve fu...
read it

Regret Bounds for Reinforcement Learning with Policy Advice
In some reinforcement learning problems an agent may be provided with a ...
read it

A Dantzig Selector Approach to Temporal Difference Learning
LSTD is a popular algorithm for value function approximation. Whenever t...
read it

Transfer from Multiple MDPs
Transfer reinforcement learning (RL) methods leverage on the experience ...
read it
Alessandro Lazaric
is this you? claim profile
Junior Researcher (CR1) at INRIA Lille  Nord Europe in the SequeL