Neural Thompson Sampling

10/02/2020
by   Weitong Zhang, et al.
7

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both exploration and exploitation. At the core of our algorithm is a novel posterior distribution of the reward, where its mean is the neural network approximator, and its variance is built upon the neural tangent features of the corresponding neural network. We prove that, provided the underlying reward function is bounded, the proposed algorithm is guaranteed to achieve a cumulative regret of 𝒪(T^1/2), which matches the regret of other contextual bandit algorithms in terms of total round number T. Experimental comparisons with other benchmark bandit algorithms on various data sets corroborate our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

Hyper-parameter Tuning for the Contextual Bandit

We study here the problem of learning the exploration exploitation trade...
research
06/24/2022

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for ...
research
06/29/2021

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

Balancing exploration and exploitation (EE) is a fundamental problem in ...
research
02/17/2020

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

The multi-armed bandit formalism has been extensively studied under vari...
research
05/21/2015

Regulating Greed Over Time

In retail, there are predictable yet dramatic time-dependent patterns in...
research
03/01/2021

A Biased Graph Neural Network Sampler with Near-Optimal Regret

Graph neural networks (GNN) have recently emerged as a vehicle for apply...
research
06/14/2017

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

Many efficient algorithms with strong theoretical guarantees have been p...

Please sign up or login with your details

Forgot password? Click here to reset