Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice

06/28/2019
by   Vikramjit Mitra, et al.
0

Millions of people reach out to digital assistants such as Siri every day, asking for information, making phone calls, seeking assistance, and much more. The expectation is that such assistants should understand the intent of the users query. Detecting the intent of a query from a short, isolated utterance is a difficult task. Intent cannot always be obtained from speech-recognized transcriptions. A transcription driven approach can interpret what has been said but fails to acknowledge how it has been said, and as a consequence, may ignore the expression present in the voice. Our work investigates whether a system can reliably detect vocal expression in queries using acoustic and paralinguistic embedding. Results show that the proposed method offers a relative equal error rate (EER) decrease of 60 system, corroborating that expression is significantly represented by vocal attributes, rather than being purely lexical. Addition of emotion embedding helped to reduce the EER by 30 demonstrating the relevance of emotion in expressive voice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2017

Confirmation detection in human-agent interaction using non-lexical speech cues

Even if only the acoustic channel is considered, human communication is ...
research
04/05/2022

How Should Voice Assistants Deal With Users' Emotions?

There is a growing body of research in HCI on detecting the users' emoti...
research
04/30/2020

Learning to Rank Intents in Voice Assistants

Voice Assistants aim to fulfill user requests by choosing the best inten...
research
03/20/2020

Detecting Mismatch between Text Script and Voice-over Using Utterance Verification Based on Phoneme Recognition Ranking

The purpose of this study is to detect the mismatch between text script ...
research
05/01/2021

Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers

Current computational-emotion research has focused on applying acoustic ...
research
06/18/2021

Bridging the Gap Between Object Detection and User Intent via Query-Modulation

When interacting with objects through cameras, or pictures, users often ...

Please sign up or login with your details

Forgot password? Click here to reset