
Bandit Phase Retrieval
We study a bandit version of phase retrieval where the learner chooses a...
Minimax Regret for Bandit Convex Optimisation of Ridge Functions
We analyse adversarial bandit convex optimisation with an adversary that...
Information Directed Sampling for Sparse Linear Bandits
Stochastic sparse linear bandits offer a practical model for highdimens...
On the Optimality of Batch Policy Optimization Algorithms
Batch policy optimization considers leveraging existing data for policy ...
Geometric Entropic Exploration
Exploration is essential for solving complex Reinforcement Learning (RL)...
Asymptotically Optimal InformationDirected Sampling
We introduce a computationally efficient algorithm for finite stochastic...
HighDimensional Sparse Linear Bandits
Stochastic linear bandits with highdimensional sparse features are a pr...
Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient
This paper provides a statistical analysis of highdimensional batch Rei...
Online Sparse Reinforcement Learning
We investigate the hardness of online reinforcement learning in fixed ho...
Mirror Descent and the Information Ratio
We establish a connection between the stability of mirror descent and th...
Gaussian Gated Linear Networks
We propose the Gaussian Gated Linear Network (GGLN), an extension to th...
Stochastic matrix games with bandit feedback
We study a version of the classical zerosum matrix game with unknown pa...
Improved Regret for ZerothOrder Adversarial Bandit Convex Optimisation
We prove that the informationtheoretic upper bound on the minimax regre...
Model Selection in Contextual Stochastic Bandit Problems
We study model selection in stochastic bandit problems. Our approach rel...
Information Directed Sampling for Linear Partial Monitoring
Partial monitoring is a rich framework for sequential decision making un...
Learning with Good Feature Representations in Bandits and in RL with a Generative Model
The construction in the recent paper by Du et al. [2019] implies that se...
Adaptive Exploration in Linear Contextual Bandit
Contextual bandits serve as a fundamental model for many sequential deci...
Gated Linear Networks
This paper presents a family of backpropagationfree neural architecture...
Behaviour Suite for Reinforcement Learning
This paper introduces the Behaviour Suite for Reinforcement Learning, or...
Iterative Budgeted Exponential Search
We tackle two longstanding problems related to reexpansions in heurist...
Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial kaction do...
Zooming Cautiously: LinearMemory Heuristic Search With Node Expansion Guarantees
We introduce and analyze two parameterfree linearmemory tree search al...
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
The informationtheoretic analysis by Russo and Van Roy (2014) in combin...
Adaptivity, Variance and Separation for Adversarial Bandits
We make three contributions to the theory of karmed adversarial bandits...
Degenerate Feedback Loops in Recommender Systems
Machine learning is used extensively in recommender systems deployed in ...
An InformationTheoretic Approach to Minimax Regret in Partial Monitoring
We prove a new minimax theorem connecting the worstcase Bayesian regret...
A Geometric Perspective on Optimal Representations for Reinforcement Learning
This paper proposes a new approach to representation learning based on g...
SoftBayes: Prod for Mixtures of Experts with LogLoss
We consider prediction with expert advice under the logloss with the go...
SingleAgent Policy Tree Search With Guarantees
We introduce two novel tree search algorithms that use a policy to guide...
Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
Online Learning to Rank with Features
We introduce a new model for online ranking in which the click probabili...
BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to rerank, where users provide ...
TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decisionmaking problem where in...
Cleaning up the neighborhood: A full classification for adversarial partial monitoring
Partial monitoring is a generalization of the wellknown multiarmed ban...
Online Learning with Gated Linear Networks
This paper describes a family of probabilistic architectures designed fo...
A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis
Existing strategies for finitearmed stochastic bandits mostly depend on...
Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Statistical performance bounds for reinforcement learning (RL) algorithm...
The End of Optimism? An Asymptotic Analysis of FiniteArmed Linear Bandits
Stochastic linear bandits are a natural and simple generalisation of fin...
Causal Bandits: Learning Good Interventions via Causal Inference
We study the problem of using causal models to improve the rate at which...
Refined Lower Bounds for Adversarial Bandits
We provide new lower bounds on the regret that must be suffered by adver...
Regret Analysis of the Anytime Optimally Confident UCB Algorithm
I introduce and analyse an anytime version of the Optimally Confident UC...
Thompson Sampling is Asymptotically Optimal in General Environments
We discuss a variant of Thompson sampling for nonparametric reinforcemen...
Conservative Bandits
We study a novel multiarmed bandit problem that models the challenge fa...
Regret Analysis of the FiniteHorizon Gittins Index Strategy for MultiArmed Bandits
I analyse the frequentist regret of the famous Gittins index strategy fo...
Concentration and Confidence for Discrete Bayesian Sequence Predictors
Bayesian sequence prediction is a simple technique for predicting future...
Asymptotically Optimal Agents
Artificial general intelligence aims to create agents capable of learnin...
Time Consistent Discounting
A possibly immortal agent tries to maximise its summed discounted reward...
