Proximal Policy Optimization and its Dynamic Version for Sequence Generation

08/24/2018
by   Yi-Lin Tuan, et al.
0

In sequence generation task, many works use policy gradient for model optimization to tackle the intractable backpropagation issue when maximizing the non-differentiable evaluation metrics or fooling the discriminator in adversarial learning. In this paper, we replace policy gradient with proximal policy optimization (PPO), which is a proved more efficient reinforcement learning algorithm, and propose a dynamic approach for PPO (PPO-dynamic). We demonstrate the efficacy of PPO and PPO-dynamic on conditional sequence generation tasks including synthetic experiment and chit-chat chatbot. The results show that PPO and PPO-dynamic can beat policy gradient by stability and performance.

READ FULL TEXT
research
10/20/2020

Proximal Policy Gradient: PPO with Policy Gradient

In this paper, we propose a new algorithm PPG (Proximal Policy Gradient)...
research
03/01/2020

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

We propose a novel hybrid stochastic policy gradient estimator by combin...
research
09/05/2022

Natural Policy Gradients In Reinforcement Learning Explained

Traditional policy gradient methods are fundamentally flawed. Natural gr...
research
02/10/2017

Batch Policy Gradient Methods for Improving Neural Conversation Models

We study reinforcement learning of chatbots with recurrent neural networ...
research
04/28/2022

Policy Gradient Stock GAN for Realistic Discrete Order Data Generation in Financial Markets

This study proposes a new generative adversarial network (GAN) for gener...
research
10/07/2021

Distributed Proximal Policy Optimization for Contention-Based Spectrum Access

The increasing number of wireless devices operating in unlicensed spectr...
research
09/23/2020

Revisiting Design Choices in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular deep policy gradient alg...

Please sign up or login with your details

Forgot password? Click here to reset