Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information. Confusion2vec provides a robust spoken language representation by considering inherent human language ambiguities. In this paper, we propose a novel word vector space estimation by unsupervised learning on lattices output by an automatic speech recognition (ASR) system. We encode each word in confusion2vec vector space by its constituent subword character n-grams. We show the subword encoding helps better represent the acoustic perceptual ambiguities in human spoken language via information modeled on lattice structured ASR output. The usefulness of the proposed Confusion2vec representation is evaluated using semantic, syntactic and acoustic analogy and word similarity tasks. We also show the benefits of subword modeling for acoustic ambiguity representation on the task of spoken language intent detection. The results significantly outperform existing word vector representations when evaluated on erroneous ASR outputs. We demonstrate that Confusion2vec subword modeling eliminates the need for retraining/adapting the natural language understanding models on ASR transcripts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2018

Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities

Word vector representations are a crucial part of Natural Language Proce...
research
04/07/2019

Spoken Language Intent Detection using Confusion2Vec

Decoding speaker's intent is a crucial part of spoken language understan...
research
09/14/2023

CiwaGAN: Articulatory information exchange

Humans encode information into sounds by controlling articulators and de...
research
12/16/2022

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

In this paper, we perform an exhaustive evaluation of different represen...
research
11/23/2020

STEPs-RL: Speech-Text Entanglement for Phonetically Sound Representation Learning

In this paper, we present a novel multi-modal deep neural network archit...
research
05/24/2020

Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding

Spoken Language Understanding (SLU) converts hypotheses from automatic s...
research
07/01/2021

Word-Free Spoken Language Understanding for Mandarin-Chinese

Spoken dialogue systems such as Siri and Alexa provide great convenience...

Please sign up or login with your details

Forgot password? Click here to reset