Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

12/09/2018
by   Ziming Li, et al.
0

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2018

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

Adversarial learning methods have been proposed for a wide range of appl...
research
08/15/2023

Generating Personas for Games with Multimodal Adversarial Imitation Learning

Reinforcement learning has been widely successful in producing agents ca...
research
11/01/2019

Positive-Unlabeled Reward Learning

Learning reward functions from data is a promising path towards achievin...
research
11/01/2017

Paraphrase Generation with Deep Reinforcement Learning

Automatic generation of paraphrases for a given sentence is an important...
research
10/20/2022

Robust Imitation via Mirror Descent Inverse Reinforcement Learning

Recently, adversarial imitation learning has shown a scalable reward acq...
research
09/24/2021

Adversarial Neural Trip Recommendation

Trip recommender system, which targets at recommending a trip consisting...
research
10/11/2022

Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

Task-oriented dialogue systems are designed to achieve specific goals wh...

Please sign up or login with your details

Forgot password? Click here to reset