Spoken Language Intent Detection using Confusion2Vec

Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging. In this paper, we address the spoken language intent detection under noisy conditions imposed by automatic speech recognition (ASR) systems. We propose to employ confusion2vec word feature representation to compensate for the errors made by ASR and to increase the robustness of the SLU system. The confusion2vec, motivated from human speech production and perception, models acoustic relationships between words in addition to the semantic and syntactic relations of words in human language. We hypothesize that ASR often makes errors relating to acoustically similar words, and the confusion2vec with inherent model of acoustic relationships between words is able to compensate for the errors. We demonstrate through experiments on the ATIS benchmark dataset, the robustness of the proposed model to achieve state-of-the-art results under noisy ASR conditions. Our system reduces classification error rate (CER) by 20.84 (lower CER degradation) relative to the previous state-of-the-art going from clean to noisy transcripts. Improvements are also demonstrated when training the intent detection models on noisy transcripts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2021

Confusion2vec 2.0: Enriching Ambiguous Spoken Language Representations with Subwords

Word vector representations enable machines to encode human language for...
research
04/13/2021

Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding

Spoken language understanding (SLU) system usually consists of various p...
research
05/23/2022

Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust Intent Detection

The past ten years have witnessed the rapid development of text-based in...
research
11/08/2018

Confusion2Vec: Towards Enriching Vector Space Word Representations with Representational Ambiguities

Word vector representations are a crucial part of Natural Language Proce...
research
12/16/2022

Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

In this paper, we perform an exhaustive evaluation of different represen...
research
10/16/2018

Subword Semantic Hashing for Intent Classification on Small Datasets

In this paper, we introduce the use of Semantic Hashing as embedding for...
research
04/11/2022

Building an ASR Error Robust Spoken Virtual Patient System in a Highly Class-Imbalanced Scenario Without Speech Data

A Virtual Patient (VP) is a powerful tool for training medical students ...

Please sign up or login with your details

Forgot password? Click here to reset