
Sketched NewtonRaphson
We propose a new globally convergent stochastic second order method. Our...
A Novel ConfidenceBased Algorithm for Structured Bandits
We study finitearmed stochastic bandits where the rewards of each arm m...
Metalearning with Stochastic Linear Bandits
We investigate metalearning procedures in the setting of stochastic lin...
Learning Adaptive Exploration Strategies in Dynamic Environments Through Informed Policy Regularization
We study the problem of learning explorationexploitation strategies tha...
Active Model Estimation in Markov Decision Processes
We study the problem of efficient exploration in order to learn an accur...
Learning Near Optimal Policies with Low Inherent Bellman Error
We study the exploration problem with approximate linear actionvalue fu...
Nearlinear Time Gaussian Process Optimization with Adaptive Batching and Resparsification
Gaussian processes (GP) are one of the most successful frameworks to mod...
Adversarial Attacks on Linear Contextual Bandits
Contextual bandit algorithms are applied in a wide range of domains, fro...
Improved Algorithms for Conservative Exploration in Bandits
In many fields such as digital marketing, healthcare, finance, and robot...
Conservative Exploration in Reinforcement Learning
While learning in an unknown Markov Decision Process (MDP), an agent sho...
Concentration Inequalities for Multinoulli Random Variables
We investigate concentration inequalities for Dirichlet and Multinomial ...
NoRegret Exploration in GoalOriented Reinforcement Learning
Many popular reinforcement learning problems (e.g., navigation in a maze...
Frequentist Regret Bounds for Randomized LeastSquares Value Iteration
We consider the explorationexploitation dilemma in finitehorizon reinf...
A Structured Prediction Approach for Generalization in Cooperative MultiAgent Reinforcement Learning
Effective coordination is crucial to solve multiagent collaborative (MA...
Wordorder biases in deepagent emergent communication
Sequenceprocessing neural networks led to remarkable progress on many N...
Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret
Gaussian processes (GP) are a popular Bayesian approach for the optimiza...
Active Exploration in Markov Decision Processes
We introduce the active exploration problem in Markov decision processes...
Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
We introduce and analyse two algorithms for explorationexploitation in ...
Rotting bandits are no harder than stochastic ones
In bandits, arms' distributions are stationary. This is often violated i...
Near Optimal ExplorationExploitation in NonCommunicating Markov Decision Processes
While designing the state space of an MDP, it is common to include state...
Distributed Adaptive Sampling for Kernel Matrix Approximation
Most kernelbased methods, such as kernel or Gaussian process regression...
Efficient BiasSpanConstrained ExplorationExploitation in Reinforcement Learning
We introduce SCAL, an algorithm designed to perform efficient exploratio...
SecondOrder Kernel Online Convex Optimization with Adaptive Sketching
Kernel online convex optimization (KOCO) is a framework combining the ex...
Experimental results : Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observab...
Thompson Sampling for LinearQuadratic Control Problems
We consider the explorationexploitation tradeoff in linear quadratic (L...
ExplorationExploitation in MDPs with Options
While a large body of empirical results show that temporallyextended ac...
Active Learning for Accurate Estimation of Linear Models
We explore the sequential decision making problem where the goal is to e...
Linear Thompson Sampling Revisited
We derive an alternative proof for the regret of Thompson sampling () in...
Reinforcement Learning in RichObservation MDPs using Spectral Methods
Designing effective explorationexploitation algorithms in Markov decisi...
Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting
We derive a new proof to show that the incremental resparsification algo...
Reinforcement Learning of POMDPs using Spectral Methods
We propose a new reinforcement learning algorithm for partially observab...
Incremental Spectral Sparsification for LargeScale GraphBased SemiSupervised Learning
While the harmonic function solution performs well in many semisupervis...
Online Stochastic Optimization under Correlated Bandit Feedback
In this paper we consider the problem of online stochastic optimization ...
Sequential Transfer in Multiarmed Bandit with Finite Set of Models
Learning from prior tasks and transferring that experience to improve fu...
Regret Bounds for Reinforcement Learning with Policy Advice
In some reinforcement learning problems an agent may be provided with a ...
A Dantzig Selector Approach to Temporal Difference Learning
LSTD is a popular algorithm for value function approximation. Whenever t...
Transfer from Multiple MDPs
Transfer reinforcement learning (RL) methods leverage on the experience ...
Alessandro Lazaric
Junior Researcher (CR1) at INRIA Lille  Nord Europe in the SequeL