Attention-based Region of Interest (ROI) Detection for Speech Emotion Recognition

03/03/2022
by   Jay Desai, et al.
0

Automatic emotion recognition for real-life appli-cations is a challenging task. Human emotion expressions aresubtle, and can be conveyed by a combination of several emo-tions. In most existing emotion recognition studies, each audioutterance/video clip is labelled/classified in its entirety. However,utterance/clip-level labelling and classification can be too coarseto capture the subtle intra-utterance/clip temporal dynamics. Forexample, an utterance/video clip usually contains only a fewemotion-salient regions and many emotionless regions. In thisstudy, we propose to use attention mechanism in deep recurrentneural networks to detection the Regions-of-Interest (ROI) thatare more emotionally salient in human emotional speech/video,and further estimate the temporal emotion dynamics by aggre-gating those emotionally salient regions-of-interest. We comparethe ROI from audio and video and analyse them. We comparethe performance of the proposed attention networks with thestate-of-the-art LSTM models on multi-class classification task ofrecognizing six basic human emotions, and the proposed attentionmodels exhibit significantly better performance. Furthermore, theattention weight distribution can be used to interpret how anutterance can be expressed as a mixture of possible emotions.

READ FULL TEXT
research
07/01/2016

Fractal Dimension Pattern Based Multiresolution Analysis for Rough Estimator of Person-Dependent Audio Emotion Recognition

As a general means of expression, audio analysis and recognition has att...
research
01/15/2019

Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition

Automatic emotion recognition (AER) is a challenging task due to the abs...
research
01/19/2016

Sparsity in Dynamics of Spontaneous Subtle Emotions: Analysis & Application

Spontaneous subtle emotions are expressed through micro-expressions, whi...
research
06/16/2021

Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

Speech sounds of spoken language are obtained by varying configuration o...
research
06/22/2023

Speech Emotion Diarization: Which Emotion Appears When?

Speech Emotion Recognition (SER) typically relies on utterance-level sol...
research
04/24/2019

A Self-Attentive Emotion Recognition Network

Modern deep learning approaches have achieved groundbreaking performance...
research
06/05/2018

Attention Based Fully Convolutional Network for Speech Emotion Recognition

Speech emotion recognition is a challenging task for three main reasons:...

Please sign up or login with your details

Forgot password? Click here to reset