
Gaussian Gated Linear Networks
We propose the Gaussian Gated Linear Network (GGLN), an extension to th...
read it

Stochastic matrix games with bandit feedback
We study a version of the classical zerosum matrix game with unknown pa...
read it

Improved Regret for ZerothOrder Adversarial Bandit Convex Optimisation
We prove that the informationtheoretic upper bound on the minimax regre...
read it

Model Selection in Contextual Stochastic Bandit Problems
We study model selection in stochastic bandit problems. Our approach rel...
read it

Information Directed Sampling for Linear Partial Monitoring
Partial monitoring is a rich framework for sequential decision making un...
read it

Learning with Good Feature Representations in Bandits and in RL with a Generative Model
The construction in the recent paper by Du et al. [2019] implies that se...
read it

Adaptive Exploration in Linear Contextual Bandit
Contextual bandits serve as a fundamental model for many sequential deci...
read it

Gated Linear Networks
This paper presents a family of backpropagationfree neural architecture...
read it

Behaviour Suite for Reinforcement Learning
This paper introduces the Behaviour Suite for Reinforcement Learning, or...
read it

Iterative Budgeted Exponential Search
We tackle two longstanding problems related to reexpansions in heurist...
read it

Exploration by Optimisation in Partial Monitoring
We provide a simple and efficient algorithm for adversarial kaction do...
read it

Zooming Cautiously: LinearMemory Heuristic Search With Node Expansion Guarantees
We introduce and analyze two parameterfree linearmemory tree search al...
read it

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio
The informationtheoretic analysis by Russo and Van Roy (2014) in combin...
read it

Adaptivity, Variance and Separation for Adversarial Bandits
We make three contributions to the theory of karmed adversarial bandits...
read it

Degenerate Feedback Loops in Recommender Systems
Machine learning is used extensively in recommender systems deployed in ...
read it

An InformationTheoretic Approach to Minimax Regret in Partial Monitoring
We prove a new minimax theorem connecting the worstcase Bayesian regret...
read it

A Geometric Perspective on Optimal Representations for Reinforcement Learning
This paper proposes a new approach to representation learning based on g...
read it

SoftBayes: Prod for Mixtures of Experts with LogLoss
We consider prediction with expert advice under the logloss with the go...
read it

SingleAgent Policy Tree Search With Guarantees
We introduce two novel tree search algorithms that use a policy to guide...
read it

Garbage In, Reward Out: Bootstrapping Exploration in MultiArmed Bandits
We propose a multiarmed bandit algorithm that explores based on randomi...
read it

Online Learning to Rank with Features
We introduce a new model for online ranking in which the click probabili...
read it

BubbleRank: Safe Online Learning to Rerank
We study the problem of online learning to rerank, where users provide ...
read it

TopRank: A practical algorithm for online stochastic ranking
Online learning to rank is a sequential decisionmaking problem where in...
read it

Cleaning up the neighborhood: A full classification for adversarial partial monitoring
Partial monitoring is a generalization of the wellknown multiarmed ban...
read it

Online Learning with Gated Linear Networks
This paper describes a family of probabilistic architectures designed fo...
read it

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis
Existing strategies for finitearmed stochastic bandits mostly depend on...
read it

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
Statistical performance bounds for reinforcement learning (RL) algorithm...
read it

The End of Optimism? An Asymptotic Analysis of FiniteArmed Linear Bandits
Stochastic linear bandits are a natural and simple generalisation of fin...
read it

Causal Bandits: Learning Good Interventions via Causal Inference
We study the problem of using causal models to improve the rate at which...
read it

Refined Lower Bounds for Adversarial Bandits
We provide new lower bounds on the regret that must be suffered by adver...
read it

Regret Analysis of the Anytime Optimally Confident UCB Algorithm
I introduce and analyse an anytime version of the Optimally Confident UC...
read it

Thompson Sampling is Asymptotically Optimal in General Environments
We discuss a variant of Thompson sampling for nonparametric reinforcemen...
read it

Conservative Bandits
We study a novel multiarmed bandit problem that models the challenge fa...
read it

Regret Analysis of the FiniteHorizon Gittins Index Strategy for MultiArmed Bandits
I analyse the frequentist regret of the famous Gittins index strategy fo...
read it

Concentration and Confidence for Discrete Bayesian Sequence Predictors
Bayesian sequence prediction is a simple technique for predicting future...
read it

Asymptotically Optimal Agents
Artificial general intelligence aims to create agents capable of learnin...
read it

Time Consistent Discounting
A possibly immortal agent tries to maximise its summed discounted reward...
read it