Direct and indirect reinforcement learning

12/23/2019
by   Yang Guan, et al.
0

Reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. In this paper, we classify RL into direct and indirect methods according to how they seek optimal policy of the Markov Decision Process (MDP) problem. The former solves optimal policy by directly maximizing an objective function using gradient descent method, in which the objective function is usually the expectation of accumulative future rewards. The latter indirectly finds the optimal policy by solving the Bellman equation, which is the sufficient and necessary condition from Bellman's principle of optimality. We take vanilla policy gradient and approximate policy iteration to study their internal relationship, and reveal that both direct and indirect methods can be unified in actor-critic architecture and are equivalent if we always choose stationary state distribution of current policy as initial state distribution of MDP. Finally, we classify the current mainstream RL algorithms and compare the differences between other criteria including value-based and policy-based, model-based and model-free.

READ FULL TEXT
research
10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...
research
07/09/2021

Likelihood ratio-based policy gradient methods for distorted risk measures: A non-asymptotic analysis

We propose policy-gradient algorithms for solving the problem of control...
research
06/07/2021

Closed-Form Analytical Results for Maximum Entropy Reinforcement Learning

We introduce a mapping between Maximum Entropy Reinforcement Learning (M...
research
01/31/2022

Reinforcement Learning with Heterogeneous Data: Estimation and Inference

Reinforcement Learning (RL) has the promise of providing data-driven sup...
research
11/19/2021

Learn Quasi-stationary Distributions of Finite State Markov Chain

We propose a reinforcement learning (RL) approach to compute the express...
research
03/16/2023

Recommending the optimal policy by learning to act from temporal data

Prescriptive Process Monitoring is a prominent problem in Process Mining...
research
07/23/2018

Learning to Play Pong using Policy Gradient Learning

Activities in reinforcement learning (RL) revolve around learning the Ma...

Please sign up or login with your details

Forgot password? Click here to reset