Attention Based Fully Convolutional Network for Speech Emotion Recognition

06/05/2018
by   Yuanyuan Zhang, et al.
0

Speech emotion recognition is a challenging task for three main reasons: 1) human emotion is abstract, which means it is hard to distinguish; 2) in general, human emotion can only be detected in some specific moments during a long utterance; 3) speech data with emotional labeling is usually limited. In this paper, we present a novel attention based fully convolutional network for speech emotion recognition. We employ fully convolutional network as it is able to handle variable-length speech, free of the demand of segmentation to keep critical information not lost. The proposed attention mechanism can make our model be aware of which time-frequency region of speech spectrogram is more emotion-relevant. Considering limited data, the transfer learning is also adapted to improve the accuracy. Especially, it's interesting to observe obvious improvement obtained with natural scene image based pre-trained model. Validated on the publicly available IEMOCAP corpus, the proposed model outperformed the state-of-the-art methods with a weighted accuracy of 70.4 an unweighted accuracy of 63.9

READ FULL TEXT
research
03/21/2018

Speech Emotion Recognition Considering Local Dynamic Features

Recently, increasing attention has been directed to the study of the spe...
research
02/06/2019

Transfer Learning From Sound Representations For Anger Detection in Speech

In this work, we train fully convolutional networks to detect anger in s...
research
11/17/2021

Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Multimodal emotion recognition is a challenging task in emotion computin...
research
03/03/2022

Attention-based Region of Interest (ROI) Detection for Speech Emotion Recognition

Automatic emotion recognition for real-life appli-cations is a challengi...
research
10/20/2019

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Speech Emotion Recognition (SER) has emerged as a critical component of ...
research
03/09/2023

hierarchical network with decoupled knowledge distillation for speech emotion recognition

The goal of Speech Emotion Recognition (SER) is to enable computers to r...
research
07/03/2022

A Graph Isomorphism Network with Weighted Multiple Aggregators for Speech Emotion Recognition

Speech emotion recognition (SER) is an essential part of human-computer ...

Please sign up or login with your details

Forgot password? Click here to reset