A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems

10/27/2021
by   Nixie S. Lesmana, et al.
0

In this paper, we establish a subgame perfect equilibrium reinforcement learning (SPERL) framework for time-inconsistent (TIC) problems. In the context of RL, TIC problems are known to face two main challenges: the non-existence of natural recursive relationships between value functions at different time points and the violation of Bellman's principle of optimality that raises questions on the applicability of standard policy iteration algorithms for unprovable policy improvement theorems. We adapt an extended dynamic programming theory and propose a new class of algorithms, called backward policy iteration (BPI), that solves SPERL and addresses both challenges. To demonstrate the practical usage of BPI as a training framework, we adapt standard RL simulation methods and derive two BPI-based training algorithms. We examine our derived training frameworks on a mean-variance portfolio selection problem and evaluate some performance metrics including convergence and model identifiability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2020

Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the con...
research
10/30/2017

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value it...
research
06/16/2020

Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration

This study is motivated by a new class of challenging control problems d...
research
05/04/2020

A Non-equilibrium Thermodynamic Framework of Consciousness

In this paper, we take a brief look at the advantages and disadvantages ...
research
06/15/2022

Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning

Keeping risk under control is often more crucial than maximizing expecte...
research
09/19/2016

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Recent progress in randomized motion planners has led to the development...
research
06/24/2019

Deep Conservative Policy Iteration

Conservative Policy Iteration (CPI) is a founding algorithm of Approxima...

Please sign up or login with your details

Forgot password? Click here to reset