Emotional Speech Recognition with Pre-trained Deep Visual Models

04/06/2022
by   Waleed Ragheb, et al.
0

In this paper, we propose a new methodology for emotional speech recognition using visual deep neural network models. We employ the transfer learning capabilities of the pre-trained computer vision deep models to have a mandate for the emotion recognition in speech task. In order to achieve that, we propose to use a composite set of acoustic features and a procedure to convert them into images. Besides, we present a training paradigm for these models taking into consideration the different characteristics between acoustic-based images and regular ones. In our experiments, we use the pre-trained VGG-16 model and test the overall methodology on the Berlin EMO-DB dataset for speaker-independent emotion recognition. We evaluate the proposed model on the full list of the seven emotions and the results set a new state-of-the-art.

READ FULL TEXT
research
04/08/2021

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings

Emotion recognition datasets are relatively small, making the use of the...
research
11/07/2022

Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words

Wake-up words (WUW) is a short sentence used to activate a speech recogn...
research
11/11/2020

Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning

We propose a novel transfer learning method for speech emotion recogniti...
research
01/19/2022

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

The prediction of valence from speech is an important, but challenging p...
research
10/22/2019

Composite Neural Network: Theory and Application to PM2.5 Prediction

This work investigates the framework and performance issues of the compo...
research
02/24/2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Speech models have long been known to overfit individual speakers for ma...
research
10/26/2022

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emot...

Please sign up or login with your details

Forgot password? Click here to reset