
InformationTheoretic Generalization Bounds for Stochastic Gradient Descent
We study the generalization properties of the popular stochastic gradien...
Logistic QLearning
We propose a new reinforcement learning algorithm derived from a regular...
A Unifying View of Optimism in Episodic Reinforcement Learning
The principle of optimism in the face of uncertainty underpins many theo...
Online learning in MDPs with linear function approximation and bandit feedback
We consider an online learning problem where the learner interacts with ...
Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits
We consider an adversarial variant of the classic Karmed linear context...
Fast Rates for Online Prediction with Abstention
In the setting of sequential prediction of individual {0, 1}sequences w...
Faster saddlepoint optimization for solving largescale Markov decision processes
We consider the problem of computing optimal policies in averagereward ...
Adaptive TemporalDifference Learning for Policy Evaluation with PerState Uncertainty Estimates
We consider the core reinforcementlearning problem of onpolicy value f...
Beating SGD Saturation with TailAveraging and Minibatching
While stochastic gradient descent (SGD) is one of the major workhorses i...
Bandit Principal Component Analysis
We consider a partialfeedback variant of the wellstudied online PCA pr...
Potential and Pitfalls of MultiArmed Bandits for Decentralized Spatial Reuse in WLANs
Spatial Reuse (SR) has recently gained attention for performance maximiz...
Online Influence Maximization with Local Observations
We consider an online influence maximization problem in which a decision...
Iterate averaging as regularization for stochastic gradient descent
We propose and analyze a variant of the classic PolyakRuppert averaging...
Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi Coexistence
Bandit Convex Optimisation (BCO) is a powerful framework for sequential ...
Collaborative Spatial Reuse in Wireless Networks via Selfish MultiArmed Bandits
Nextgeneration wireless deployments are characterized by being dense an...
On the Hardness of Inventory Management with Censored Demand Data
We consider a repeated newsvendor problem where the inventory manager ha...
Boltzmann Exploration Done Right
Boltzmann exploration is a classic strategy for sequential decisionmaki...
A unified view of entropyregularized Markov decision processes
We propose a general framework for entropyregularized averagereward re...
Algorithmic stability and hypothesis complexity
We introduce a notion of algorithmic stability of learning algorithms...
Fast rates for online learning in Linearly Solvable Markov Decision Processes
We study the problem of online learning in a class of Markov decision pr...
Explore no more: Improved highprobability regret bounds for nonstochastic bandits
This work addresses the problem of regret minimization in nonstochastic...
Importance weighting without importance weights: An efficient algorithm for combinatorial semibandits
We propose a sampleefficient alternative for importance weighting for s...
Firstorder regret bounds for combinatorial semibandits
We consider the problem of online combinatorial optimization under semi...
Online learning in MDPs with side information
We study online learning of finite Markov decision process (MDP) problem...
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
In this paper we propose a novel gradient algorithm to learn a policy fr...
