Pretrained audio neural networks for Speech emotion recognition in Portuguese

10/26/2022
by   Marcelo Matheus Gauy, et al.
0

The goal of speech emotion recognition (SER) is to identify the emotional aspects of speech. The SER challenge for Brazilian Portuguese speech was proposed with short snippets of Portuguese which are classified as neutral, non-neutral female and non-neutral male according to paralinguistic elements (laughing, crying, etc). This dataset contains about 50 minutes of Brazilian Portuguese speech. As the dataset leans on the small side, we investigate whether a combination of transfer learning and data augmentation techniques can produce positive results. Thus, by combining a data augmentation technique called SpecAugment, with the use of Pretrained Audio Neural Networks (PANNs) for transfer learning we are able to obtain interesting results. The PANNs (CNN6, CNN10 and CNN14) are pretrained on a large dataset called AudioSet containing more than 5000 hours of audio. They were finetuned on the SER dataset and the best performing model (CNN10) on the validation set was submitted to the challenge, achieving an F1 score of 0.73 up from 0.54 from the baselines provided by the challenge. Moreover, we also tested the use of Transformer neural architecture, pretrained on about 600 hours of Brazilian Portuguese audio data. Transformers, as well as more complex models of PANNs (CNN14), fail to generalize to the test set in the SER dataset and do not beat the baseline. Considering the limitation of the dataset sizes, currently the best approach for SER is using PANNs (specifically, CNN6 and CNN10).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

A Transfer Learning Method for Speech Emotion Recognition from Automatic Speech Recognition

This paper presents a transfer learning method in speech emotion recogni...
research
10/19/2020

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

We present a novel, Multi-Window Data Augmentation (MWA-SER) approach fo...
research
10/31/2022

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Speech emotion recognition (SER) classifies audio into emotion categorie...
research
10/31/2018

Deep Net Features for Complex Emotion Recognition

This paper investigates the influence of different acoustic features, au...
research
06/14/2021

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

Researchers have recently started to study how the emotional speech hear...
research
01/02/2023

EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies

Vocal Bursts – short, non-speech vocalizations that convey emotions, suc...
research
02/15/2018

Speech Emotion Recognition with Data Augmentation and Layer-wise Learning Rate Adjustment

In this work, we design a neural network for recognizing emotions in spe...

Please sign up or login with your details

Forgot password? Click here to reset