Optimistic Reinforcement Learning by Forward Kullback-Leibler Divergence Optimization

05/27/2021
by   Taisuke Kobayashi, et al.
0

This paper addresses a new interpretation of reinforcement learning (RL) as reverse Kullback-Leibler (KL) divergence optimization, and derives a new optimization method using forward KL divergence. Although RL originally aims to maximize return indirectly through optimization of policy, the recent work by Levine has proposed a different derivation process with explicit consideration of optimality as stochastic variable. This paper follows this concept and formulates the traditional learning laws for both value function and policy as the optimization problems with reverse KL divergence including optimality. Focusing on the asymmetry of KL divergence, the new optimization problems with forward KL divergence are derived. Remarkably, such new optimization problems can be regarded as optimistic RL. That optimism is intuitively specified by a hyperparameter converted from an uncertainty parameter. In addition, it can be enhanced when it is integrated with prioritized experience replay and eligibility traces, both of which accelerate learning. The effects of this expected optimism was investigated through learning tendencies on numerical simulations using Pybullet. As a result, moderate optimism accelerated learning and yielded higher rewards. In a realistic robotic simulation, the proposed method with the moderate optimism outperformed one of the state-of-the-art RL method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2023

Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

Many policy optimization approaches in reinforcement learning incorporat...
research
08/17/2020

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which...
research
02/16/2023

Aligning Language Models with Preferences through f-divergence Minimization

Aligning language models with preferences can be posed as approximating ...
research
07/17/2021

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Approximate Policy Iteration (API) algorithms alternate between (approxi...
research
10/20/2021

CIM-PPO:Proximal Policy Optimization with Liu-Correntropy Induced Metric

As an algorithm based on deep reinforcement learning, Proximal Policy Op...
research
05/16/2022

q-Munchausen Reinforcement Learning

The recently successful Munchausen Reinforcement Learning (M-RL) feature...
research
05/01/2017

Nonlinear Kalman Filtering with Divergence Minimization

We consider the nonlinear Kalman filtering problem using Kullback-Leible...

Please sign up or login with your details

Forgot password? Click here to reset