
InformationTheoretic Generalization Bounds for Stochastic Gradient Descent
We study the generalization properties of the popular stochastic gradien...
read it

Logistic QLearning
We propose a new reinforcement learning algorithm derived from a regular...
read it

A Unifying View of Optimism in Episodic Reinforcement Learning
The principle of optimism in the face of uncertainty underpins many theo...
read it

Online learning in MDPs with linear function approximation and bandit feedback
We consider an online learning problem where the learner interacts with ...
read it

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits
We consider an adversarial variant of the classic Karmed linear context...
read it

Fast Rates for Online Prediction with Abstention
In the setting of sequential prediction of individual {0, 1}sequences w...
read it

Faster saddlepoint optimization for solving largescale Markov decision processes
We consider the problem of computing optimal policies in averagereward ...
read it

Adaptive TemporalDifference Learning for Policy Evaluation with PerState Uncertainty Estimates
We consider the core reinforcementlearning problem of onpolicy value f...
read it

Beating SGD Saturation with TailAveraging and Minibatching
While stochastic gradient descent (SGD) is one of the major workhorses i...
read it

Bandit Principal Component Analysis
We consider a partialfeedback variant of the wellstudied online PCA pr...
read it

Potential and Pitfalls of MultiArmed Bandits for Decentralized Spatial Reuse in WLANs
Spatial Reuse (SR) has recently gained attention for performance maximiz...
read it

Online Influence Maximization with Local Observations
We consider an online influence maximization problem in which a decision...
read it

Iterate averaging as regularization for stochastic gradient descent
We propose and analyze a variant of the classic PolyakRuppert averaging...
read it

Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi Coexistence
Bandit Convex Optimisation (BCO) is a powerful framework for sequential ...
read it

Collaborative Spatial Reuse in Wireless Networks via Selfish MultiArmed Bandits
Nextgeneration wireless deployments are characterized by being dense an...
read it

On the Hardness of Inventory Management with Censored Demand Data
We consider a repeated newsvendor problem where the inventory manager ha...
read it

Boltzmann Exploration Done Right
Boltzmann exploration is a classic strategy for sequential decisionmaki...
read it

A unified view of entropyregularized Markov decision processes
We propose a general framework for entropyregularized averagereward re...
read it

Algorithmic stability and hypothesis complexity
We introduce a notion of algorithmic stability of learning algorithms...
read it

Fast rates for online learning in Linearly Solvable Markov Decision Processes
We study the problem of online learning in a class of Markov decision pr...
read it

Explore no more: Improved highprobability regret bounds for nonstochastic bandits
This work addresses the problem of regret minimization in nonstochastic...
read it

Importance weighting without importance weights: An efficient algorithm for combinatorial semibandits
We propose a sampleefficient alternative for importance weighting for s...
read it

Firstorder regret bounds for combinatorial semibandits
We consider the problem of online combinatorial optimization under semi...
read it

Online learning in MDPs with side information
We study online learning of finite Markov decision process (MDP) problem...
read it

Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
In this paper we propose a novel gradient algorithm to learn a policy fr...
read it