Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

10/30/2017
by   Tadashi Kozuno, et al.
0

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iteration, advantage learning, and dynamic policy programming. We call it generalized value iteration (GVI) and its approximated version, approximate GVI (AGVI). We show AGVI's performance guarantee, which includes performance guarantees for existing algorithms, as special cases. We discuss theoretical weaknesses of existing algorithms, and explain the advantages of AGVI. Numerical experiments in a simple environment support theoretical arguments, and suggest that AGVI is a promising alternative to previous algorithms.

READ FULL TEXT

page 7

page 8

research
12/19/2013

The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

This note provides a simple example demonstrating that, if exact computa...
research
07/03/2020

A Unifying View of Optimism in Episodic Reinforcement Learning

The principle of optimism in the face of uncertainty underpins many theo...
research
09/12/2023

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Decision Transformer (DT), which employs expressive sequence modeling te...
research
09/25/2018

Anderson Acceleration for Reinforcement Learning

Anderson acceleration is an old and simple method for accelerating the c...
research
06/16/2020

Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

In this paper time-driven learning refers to the machine learning method...
research
10/27/2021

A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems

In this paper, we establish a subgame perfect equilibrium reinforcement ...
research
10/17/2020

Approximate information state for approximate planning and reinforcement learning in partially observed systems

We propose a theoretical framework for approximate planning and learning...

Please sign up or login with your details

Forgot password? Click here to reset