Learning to Dialogue via Complex Hindsight Experience Replay

08/20/2018
by   Keting Lu, et al.
0

Reinforcement learning methods have been used for learning dialogue policies from the experience of conversations. However, learning an effective dialogue policy frequently requires prohibitively many conversations. This is partly because of the sparse rewards in dialogues, and the relatively small number of successful dialogues in early learning phase. Hindsight experience replay (HER) enables an agent to learn from failure, but the vanilla HER is inapplicable to dialogue domains due to dialogue goals being implicit (c.f., explicit goals in manipulation tasks). In this work, we develop two complex HER methods providing different trade-offs between complexity and performance. Experiments were conducted using a realistic user simulator. Results suggest that our HER methods perform better than standard and prioritized experience replay methods (as applied to deep Q-networks) in learning rate, and that our two complex HER methods can be combined to produce the best performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2020

Sample Efficiency in Sparse Reinforcement Learning: Or Your Money Back

Sparse rewards present a difficult problem in reinforcement learning and...
research
10/19/2019

Reverse Experience Replay

This paper describes an improvement in Deep Q-learning called Reverse Ex...
research
01/12/2020

Deep Reinforcement Learning for Complex Manipulation Tasks with Sparse Feedback

Learning optimal policies from sparse feedback is a known challenge in r...
research
02/06/2020

Soft Hindsight Experience Replay

Efficient learning in the environment with sparse rewards is one of the ...
research
01/31/2019

Visual Hindsight Experience Replay

Reinforcement Learning algorithms typically require millions of environm...
research
06/24/2019

Optimal Use of Experience in First Person Shooter Environments

Although reinforcement learning has made great strides recently, a conti...
research
08/31/2022

Cluster-based Sampling in Hindsight Experience Replay for Robot Control

In multi-goal reinforcement learning in an environment, agents learn pol...

Please sign up or login with your details

Forgot password? Click here to reset