Choquet regularization for reinforcement learning

08/17/2022
by   Xia Han, et al.
0

We propose Choquet regularizers to measure and manage the level of exploration for reinforcement learning (RL), and reformulate the continuous-time entropy-regularized RL problem of Wang et al. (2020, JMLR, 21(198)) in which we replace the differential entropy used for regularization with a Choquet regularizer. We derive the Hamilton–Jacobi–Bellman equation of the problem, and solve it explicitly in the linear–quadratic (LQ) case via maximizing statically a mean–variance constrained Choquet regularizer. Under the LQ setting, we derive explicit optimal distributions for several specific Choquet regularizers, and conversely identify the Choquet regularizers that generate a number of broadly used exploratory samplers such as ϵ-greedy, exponential, uniform and Gaussian.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2019

Continuous-Time Mean-Variance Portfolio Optimization via Reinforcement Learning

We consider continuous-time Mean-variance (MV) portfolio optimization pr...
research
07/04/2020

Discount Factor as a Regularizer in Reinforcement Learning

Specifying a Reinforcement Learning (RL) task involves choosing a suitab...
research
12/04/2018

Exploration versus exploitation in reinforcement learning: a stochastic control approach

We consider reinforcement learning (RL) in continuous time and study the...
research
07/26/2019

Large scale continuous-time mean-variance portfolio allocation via reinforcement learning

We propose to solve large scale Markowitz mean-variance (MV) portfolio a...
research
04/25/2019

Continuous-Time Mean-Variance Portfolio Selection: A Reinforcement Learning Framework

We approach the continuous-time mean-variance (MV) portfolio selection w...
research
09/27/2022

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over tim...
research
11/25/2020

Exploratory LQG Mean Field Games with Entropy Regularization

We study a general class of entropy-regularized multi-variate LQG mean f...

Please sign up or login with your details

Forgot password? Click here to reset