Policy Gradient Optimization of Thompson Sampling Policies

06/30/2020
by   Seungki Min, et al.
0

We study the use of policy gradient algorithms to optimize over a class of generalized Thompson sampling policies. Our central insight is to view the posterior parameter sampled by Thompson sampling as a kind of pseudo-action. Policy gradient methods can then be tractably applied to search over a class of sampling policies, which determine a probability distribution over pseudo-actions (i.e., sampled parameters) as a function of observed data. We also propose and compare policy gradient estimators that are specialized to Bayesian bandit problems. Numerical experiments demonstrate that direct policy search on top of Thompson sampling automatically corrects for some of the algorithm's known shortcomings and offers meaningful improvements even in long horizon problems where standard Thompson sampling is extremely effective.

READ FULL TEXT
research
02/21/2018

Clipped Action Policy Gradient

Many continuous control tasks have bounded action spaces and clip out-of...
research
07/21/2020

A Note on the Linear Convergence of Policy Gradient Methods

We revisit the finite time analysis of policy gradient methods in the si...
research
11/12/2019

On Policy Gradients

The goal of policy gradient approaches is to find a policy in a given cl...
research
05/12/2023

S-REINFORCE: A Neuro-Symbolic Policy Gradient Approach for Interpretable Reinforcement Learning

This paper presents a novel RL algorithm, S-REINFORCE, which is designed...
research
06/12/2019

Knowledge Gradient for Selection with Covariates: Consistency and Computation

Knowledge gradient is a design principle for developing Bayesian sequent...
research
03/24/2022

Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment

In modern robotics, effectively computing optimal control policies under...
research
05/08/2019

Smoothing Policies and Safe Policy Gradients

Policy gradient algorithms are among the best candidates for the much an...

Please sign up or login with your details

Forgot password? Click here to reset