Disambiguating Speech Intention via Audio-Text Co-attention Framework: A Case of Prosody-semantics Interface

10/21/2019
by   Won Ik Cho, et al.
0

Understanding the intention of an utterance is challenging for some prosody-sensitive cases, especially when it is in the written form. The main concern is to detect the directivity or rhetoricalness of an utterance and to distinguish the type of question. Since it is inevitable to face both the issues regarding prosody and semantics, the identification is expected to benefit from the observations of human language processing mechanism. In this paper, we combat the task with attentive recurrent neural networks that exploit acoustic and textual features, using a manually created speech corpus that incorporates only the syntactically ambiguous utterances which require prosody for disambiguation. We found out that co-attention frameworks on audio-text data, namely multi-hop attention and cross-attention, can perform better than previously suggested speech-based/text-aided networks. By this, we infer that understanding the genuine intention of the ambiguous utterances incorporates recognizing the interaction between auditory and linguistic processes.

READ FULL TEXT
research
11/10/2018

Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency

For a large portion of real-life utterances, the intention cannot be sol...
research
04/23/2019

Speech Emotion Recognition Using Multi-Hop Attention Mechanism

In this paper, we are interested in exploiting textual and acoustic data...
research
06/24/2019

A computational model of early language acquisition from audiovisual experiences of young infants

Earlier research has suggested that human infants might use statistical ...
research
12/07/2020

Using previous acoustic context to improve Text-to-Speech synthesis

Many speech synthesis datasets, especially those derived from audiobooks...
research
12/13/2022

InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation

Current approaches to empathetic response generation typically encode th...
research
04/06/2021

An Initial Investigation for Detecting Partially Spoofed Audio

All existing databases of spoofed speech contain attack data that is spo...
research
10/10/2018

Structured Argument Extraction of Korean Question and Command

Intention identification and slot filling is a core issue in dialog mana...

Please sign up or login with your details

Forgot password? Click here to reset