Combing Policy Evaluation and Policy Improvement in a Unified f-Divergence Framework

09/24/2021
by   Chen gong, et al.
0

The framework of deep reinforcement learning (DRL) provides a powerful and widely applicable mathematical formalization for sequential decision-making. In this paper, we start from studying the f-divergence between learning policy and sampling policy and derive a novel DRL framework, termed f-Divergence Reinforcement Learning (FRL). We highlight that the policy evaluation and policy improvement phases are induced by minimizing f-divergence between learning policy and sampling policy, which is distinct from the conventional DRL algorithm objective that maximizes the expected cumulative rewards. Besides, we convert this framework to a saddle-point optimization problem with a specific f function through Fenchel conjugate, which consists of policy evaluation and policy improvement. Then we derive new policy evaluation and policy improvement methods in FRL. Our framework may give new insights for analyzing DRL algorithms. The FRL framework achieves two advantages: (1) policy evaluation and policy improvement processes are derived simultaneously by f-divergence; (2) overestimation issue of value function are alleviated. To evaluate the effectiveness of the FRL framework, we conduct experiments on Atari 2600 video games, which show that our framework matches or surpasses the DRL algorithms we tested.

READ FULL TEXT

page 10

page 23

research
09/12/2023

Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning

Deep Reinforcement Learning (DRL) has achieved remarkable success in seq...
research
12/19/2020

Minimax Strikes Back

Deep Reinforcement Learning (DRL) reaches a superhuman level of play in ...
research
10/07/2020

Proximal Policy Optimization with Relative Pearson Divergence

Deep reinforcement learning (DRL) is one of the promising approaches for...
research
01/12/2022

Evolutionary Action Selection for Gradient-based Policy Learning

Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have...
research
12/22/2016

Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

This paper investigates a type of instability that is linked to the gree...
research
02/22/2022

Multi-fidelity reinforcement learning framework for shape optimization

Deep reinforcement learning (DRL) is a promising outer-loop intelligence...
research
11/16/2022

Minimum information divergence of Q-functions for dynamic treatment resumes

This paper aims at presenting a new application of information geometry ...

Please sign up or login with your details

Forgot password? Click here to reset