Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

03/06/2020
by   Eric Guizzo, et al.
0

Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is ex-pressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. MTS extends convolutional neural networks with convolution kernels that are scaled and re-sampled along the time axis, to increase temporal flexibility without increasing the number of trainable parameters compared to standard convolutional layers. We evaluate MTS and standard convolutional layers in different architectures for emotion recognition from speech audio, using 4 datasets of different sizes. The results show that the use of MTS layers consistently improves the generalization of networks of different capacity and depth, compared to standard convolution, especially on smaller datasets

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2018

Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech

Current approaches to speech emotion recognition focus on speech feature...
research
02/29/2020

Emotion Recognition System from Speech and Visual Information based on Convolutional Neural Networks

Emotion recognition has become an important field of research in the hum...
research
06/07/2017

Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition

Deep convolutional neural networks are being actively investigated in a ...
research
08/05/2020

Compact Graph Architecture for Speech Emotion Recognition

We propose a deep graph approach to address the task of speech emotion r...
research
08/23/2017

Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition

The goal of continuous emotion recognition is to assign an emotion value...
research
10/13/2021

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

In this contribution, we investigate the effectiveness of deep fusion of...
research
06/09/2020

audino: A Modern Annotation Tool for Audio and Speech

In this paper, we introduce a collaborative and modern annotation tool f...

Please sign up or login with your details

Forgot password? Click here to reset