Tor Lattimore

research

∙ 06/22/2023

Context-lumpable stochastic bandits

We consider a contextual bandit problem with S contexts and A actions. I...

0 Chung-Wei Lee, et al. ∙

research

∙ 05/17/2023

Sequential Best-Arm Identification with Application to Brain-Computer Interface

A brain-computer interface (BCI) is a technology that enables direct com...

0 Xin Zhou, et al. ∙

research

∙ 02/10/2023

A Second-Order Method for Stochastic Bandit Convex Optimisation

We introduce a simple and efficient algorithm for unconstrained zeroth-o...

0 Tor Lattimore, et al. ∙

research

∙ 02/07/2023

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-ma...

0 Johannes Kirschner, et al. ∙

research

∙ 02/07/2023

Leveraging Demonstrations to Improve Online Learning: Quality Matters

We investigate the extent to which offline demonstration data can improv...

0 Botao Hao, et al. ∙

research

∙ 06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...

0 Botao Hao, et al. ∙

research

∙ 05/26/2022

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

We study distributed contextual linear bandits with stochastic contexts,...

0 Sanae Amani, et al. ∙

research

∙ 05/22/2022

Contextual Information-Directed Sampling

Information-directed sampling (IDS) has recently demonstrated its potent...

0 Botao Hao, et al. ∙

research

∙ 02/22/2022

Minimax Regret for Partial Monitoring: Infinite Outcomes and Rustichini's Regret

We show that a version of the generalised information ratio of Lattimore...

0 Tor Lattimore, et al. ∙

research

∙ 10/29/2021

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must bala...

0 Brendan O'Donoghue, et al. ∙

research

∙ 06/03/2021

Bandit Phase Retrieval

We study a bandit version of phase retrieval where the learner chooses a...

0 Tor Lattimore, et al. ∙

research

∙ 06/01/2021

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

We analyse adversarial bandit convex optimisation with an adversary that...

0 Tor Lattimore, et al. ∙

research

∙ 05/29/2021

Information Directed Sampling for Sparse Linear Bandits

Stochastic sparse linear bandits offer a practical model for high-dimens...

0 Botao Hao, et al. ∙

research

∙ 04/06/2021

On the Optimality of Batch Policy Optimization Algorithms

Batch policy optimization considers leveraging existing data for policy ...

0 Chenjun Xiao, et al. ∙

research

∙ 01/06/2021

Geometric Entropic Exploration

Exploration is essential for solving complex Reinforcement Learning (RL)...

0 Zhaohan Daniel Guo, et al. ∙

research

∙ 11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...

0 Johannes Kirschner, et al. ∙

research

∙ 11/08/2020

High-Dimensional Sparse Linear Bandits

Stochastic linear bandits with high-dimensional sparse features are a pr...

0 Botao Hao, et al. ∙

research

∙ 11/08/2020

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient

This paper provides a statistical analysis of high-dimensional batch Rei...

0 Botao Hao, et al. ∙

research

∙ 11/08/2020

Online Sparse Reinforcement Learning

We investigate the hardness of online reinforcement learning in fixed ho...

0 Botao Hao, et al. ∙

research

∙ 09/25/2020

Mirror Descent and the Information Ratio

We establish a connection between the stability of mirror descent and th...

0 Tor Lattimore, et al. ∙

research

∙ 06/10/2020

Gaussian Gated Linear Networks

We propose the Gaussian Gated Linear Network (G-GLN), an extension to th...

12 David Budden, et al. ∙

research

∙ 06/09/2020

Stochastic matrix games with bandit feedback

We study a version of the classical zero-sum matrix game with unknown pa...

0 Brendan O'Donoghue, et al. ∙

research

∙ 05/31/2020

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

We prove that the information-theoretic upper bound on the minimax regre...

0 Tor Lattimore, et al. ∙

research

∙ 03/03/2020

Model Selection in Contextual Stochastic Bandit Problems

We study model selection in stochastic bandit problems. Our approach rel...

0 Aldo Pacchiano, et al. ∙

research

∙ 02/25/2020

Information Directed Sampling for Linear Partial Monitoring

Partial monitoring is a rich framework for sequential decision making un...

21 Johannes Kirschner, et al. ∙

research

∙ 11/18/2019

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

The construction in the recent paper by Du et al. [2019] implies that se...

11 Tor Lattimore, et al. ∙

research

∙ 10/15/2019

Adaptive Exploration in Linear Contextual Bandit

Contextual bandits serve as a fundamental model for many sequential deci...

0 Botao Hao, et al. ∙

research

∙ 09/30/2019

Gated Linear Networks

This paper presents a family of backpropagation-free neural architecture...

38 Joel Veness, et al. ∙

research

∙ 08/09/2019

Behaviour Suite for Reinforcement Learning

This paper introduces the Behaviour Suite for Reinforcement Learning, or...

2 Ian Osband, et al. ∙

research

∙ 07/30/2019

Iterative Budgeted Exponential Search

We tackle two long-standing problems related to re-expansions in heurist...

0 Malte Helmert, et al. ∙

research

∙ 07/12/2019

Exploration by Optimisation in Partial Monitoring

We provide a simple and efficient algorithm for adversarial k-action d-o...

1 Tor Lattimore, et al. ∙

research

∙ 06/07/2019

Zooming Cautiously: Linear-Memory Heuristic Search With Node Expansion Guarantees

We introduce and analyze two parameter-free linear-memory tree search al...

0 Laurent Orseau, et al. ∙

research

∙ 05/28/2019

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

The information-theoretic analysis by Russo and Van Roy (2014) in combin...

0 Julian Zimmert, et al. ∙

research

∙ 03/19/2019

Adaptivity, Variance and Separation for Adversarial Bandits

We make three contributions to the theory of k-armed adversarial bandits...

0 Roman Pogodin, et al. ∙

research

∙ 02/27/2019

Degenerate Feedback Loops in Recommender Systems

Machine learning is used extensively in recommender systems deployed in ...

14 Ray Jiang, et al. ∙

research

∙ 02/01/2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

We prove a new minimax theorem connecting the worst-case Bayesian regret...

6 Tor Lattimore, et al. ∙

research

∙ 01/31/2019

A Geometric Perspective on Optimal Representations for Reinforcement Learning

This paper proposes a new approach to representation learning based on g...

10 Marc G. Bellemare, et al. ∙

research

∙ 01/08/2019

Soft-Bayes: Prod for Mixtures of Experts with Log-Loss

We consider prediction with expert advice under the log-loss with the go...

12 Laurent Orseau, et al. ∙

research

∙ 11/27/2018

Single-Agent Policy Tree Search With Guarantees

We introduce two novel tree search algorithms that use a policy to guide...

0 Laurent Orseau, et al. ∙

research

∙ 11/13/2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits

We propose a multi-armed bandit algorithm that explores based on randomi...

6 Branislav Kveton, et al. ∙

research

∙ 10/05/2018

Online Learning to Rank with Features

We introduce a new model for online ranking in which the click probabili...

8 Tor Lattimore, et al. ∙

research

∙ 06/15/2018

BubbleRank: Safe Online Learning to Rerank

We study the problem of online learning to re-rank, where users provide ...

2 Branislav Kveton, et al. ∙

research

∙ 06/06/2018

TopRank: A practical algorithm for online stochastic ranking

Online learning to rank is a sequential decision-making problem where in...

2 Tor Lattimore, et al. ∙

research

∙ 05/23/2018

Cleaning up the neighborhood: A full classification for adversarial partial monitoring

Partial monitoring is a generalization of the well-known multi-armed ban...

0 Tor Lattimore, et al. ∙

research

∙ 12/05/2017

Online Learning with Gated Linear Networks

This paper describes a family of probabilistic architectures designed fo...

0 Joel Veness, et al. ∙

research

∙ 03/27/2017

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis

Existing strategies for finite-armed stochastic bandits mostly depend on...

0 Tor Lattimore, et al. ∙

research

∙ 03/22/2017

Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Statistical performance bounds for reinforcement learning (RL) algorithm...

0 Christoph Dann, et al. ∙

research

∙ 10/14/2016

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits

Stochastic linear bandits are a natural and simple generalisation of fin...

0 Tor Lattimore, et al. ∙

research

∙ 06/10/2016

Causal Bandits: Learning Good Interventions via Causal Inference

We study the problem of using causal models to improve the rate at which...

0 Finnian Lattimore, et al. ∙

research

∙ 05/24/2016

Refined Lower Bounds for Adversarial Bandits

We provide new lower bounds on the regret that must be suffered by adver...

0 Sébastien Gerchinovitz, et al. ∙

Tor Lattimore

Featured Co-authors

Sign in with Google

Consider DeepAI Pro