Thompson Sampling on Asymmetric α-Stable Bandits

03/19/2022
by   Zhendong Shi, et al.
0

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric α-stable distributions and explore their applications in modelling financial and wireless data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

EXP-based algorithms are often used for exploration in multi-armed bandi...
research
06/29/2020

Multi-armed bandit approach to password guessing

The multi-armed bandit is a mathematical interpretation of the problem a...
research
11/12/2019

Incentivized Exploration for Multi-Armed Bandits under Reward Drift

We study incentivized exploration for the multi-armed bandit (MAB) probl...
research
04/18/2021

Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem

A core challenge of evolutionary search is the need to balance between e...
research
05/08/2022

Some performance considerations when using multi-armed bandit algorithms in the presence of missing data

When using multi-armed bandit algorithms, the potential impact of missin...
research
12/08/2017

On Adaptive Estimation for Dynamic Bernoulli Bandits

The multi-armed bandit (MAB) problem is a classic example of the explora...
research
07/19/2012

The Road to VEGAS: Guiding the Search over Neutral Networks

VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology...

Please sign up or login with your details

Forgot password? Click here to reset