Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition

08/23/2017
by   Soheil Khorram, et al.
0

The goal of continuous emotion recognition is to assign an emotion value to every frame in a sequence of acoustic features. We show that incorporating long-term temporal dependencies is critical for continuous emotion recognition tasks. To this end, we first investigate architectures that use dilated convolutions. We show that even though such architectures outperform previously reported systems, the output signals produced from such architectures undergo erratic changes between consecutive time steps. This is inconsistent with the slow moving ground-truth emotion labels that are obtained from human annotators. To deal with this problem, we model a downsampled version of the input signal and then generate the output signal through upsampling. Not only does the resulting downsampling/upsampling network achieve good performance, it also generates smooth output trajectories. Our method yields the best known audio-only performance on the RECOLA dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2018

Reusing Neural Speech Representations for Auditory Emotion Recognition

Acoustic emotion recognition aims to categorize the affective state of t...
research
08/04/2023

Capturing Spectral and Long-term Contextual Information for Speech Emotion Recognition Using Deep Learning Techniques

Traditional approaches in speech emotion recognition, such as LSTM, CNN,...
research
02/12/2020

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

In this work, we explore the dependencies between speaker recognition an...
research
03/06/2020

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

Robustness against temporal variations is important for emotion recognit...
research
07/05/2019

Jointly Aligning and Predicting Continuous Emotion Annotations

Time-continuous dimensional descriptions of emotions (e.g., arousal, val...
research
10/30/2018

Deep Learning as Feature Encoding for Emotion Recognition

Deep learning is popular as an end-to-end framework extracting the promi...
research
05/15/2020

"I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

In applications that use emotion recognition via speech, frame-loss can ...

Please sign up or login with your details

Forgot password? Click here to reset