DeepAI AI Chat
Log In Sign Up

Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

by   Vaishali Pal, et al.

Spoken dialogue systems typically use a list of top-N ASR hypotheses for inferring the semantic meaning and tracking the state of the dialogue. However ASR graphs, such as confusion networks (confnets), provide a compact representation of a richer hypothesis space than a top-N ASR list. In this paper, we study the benefits of using confusion networks with a state-of-the-art neural dialogue state tracker (DST). We encode the 2-dimensional confnet into a 1-dimensional sequence of embeddings using an attentional confusion network encoder which can be used with any DST system. Our confnet encoder is plugged into the state-of-the-art 'Global-locally Self-Attentive Dialogue State Tacker' (GLAD) model for DST and obtains significant improvements in both accuracy and inference time compared to using top-N ASR hypotheses.


page 1

page 2

page 3

page 4


Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

Spoken Language Understanding (SLU) converts hypotheses from automatic s...

Global-Locally Self-Attentive Dialogue State Tracker

Dialogue state tracking, which estimates user goals and requests given t...

N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses

Spoken Language Understanding (SLU) systems parse speech into semantic s...

OLISIA: a Cascade System for Spoken Dialogue State Tracking

Though Dialogue State Tracking (DST) is a core component of spoken dialo...

Scalable Neural Dialogue State Tracking

A Dialogue State Tracker (DST) is a key component in a dialogue system a...

A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts

Motivated by a project to create a system for people who are deaf or har...

Recurrent Polynomial Network for Dialogue State Tracking

Dialogue state tracking (DST) is a process to estimate the distribution ...