A New Amharic Speech Emotion Dataset and Classification Benchmark

01/07/2022
by   Ephrem A. Retta, et al.
0

In this paper we present the Amharic Speech Emotion Dataset (ASED), which covers four dialects (Gojjam, Wollo, Shewa and Gonder) and five different emotions (neutral, fearful, happy, sad and angry). We believe it is the first Speech Emotion Recognition (SER) dataset for the Amharic language. 65 volunteer participants, all native speakers, recorded 2,474 sound samples, two to four seconds in length. Eight judges assigned emotions to the samples with high agreement level (Fleiss kappa = 0.8). The resulting dataset is freely available for download. Next, we developed a four-layer variant of the well-known VGG model which we call VGGb. Three experiments were then carried out using VGGb for SER, using ASED. First, we investigated whether Mel-spectrogram features or Mel-frequency Cepstral coefficient (MFCC) features work best for Amharic. This was done by training two VGGb SER models on ASED, one using Mel-spectrograms and the other using MFCC. Four forms of training were tried, standard cross-validation, and three variants based on sentences, dialects and speaker groups. Thus, a sentence used for training would not be used for testing, and the same for a dialect and speaker group. The conclusion was that MFCC features are superior under all four training schemes. MFCC was therefore adopted for Experiment 2, where VGGb and three other existing models were compared on ASED: RESNet50, Alex-Net and LSTM. VGGb was found to have very good accuracy (90.73 as well as the fastest training time. In Experiment 3, the performance of VGGb was compared when trained on two existing SER datasets, RAVDESS (English) and EMO-DB (German) as well as on ASED (Amharic). Results are comparable across these languages, with ASED being the highest. This suggests that VGGb can be successfully applied to other languages. We hope that ASED will encourage researchers to experiment with other models for Amharic SER.

READ FULL TEXT
research
07/20/2023

Cross-Corpus Multilingual Speech Emotion Recognition: Amharic vs. Other Languages

In a conventional Speech emotion recognition (SER) task, a classifier fo...
research
06/04/2018

DNN-HMM based Speaker Adaptive Emotion Recognition using Proposed Epoch and MFCC Features

Speech is produced when time varying vocal tract system is excited with ...
research
08/17/2023

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

Recent advancements in transformer-based speech representation models ha...
research
05/19/2021

SEMOUR: A Scripted Emotional Speech Repository for Urdu

Designing reliable Speech Emotion Recognition systems is a complex task ...
research
01/20/2022

Kinit Classification in Ethiopian Chants, Azmaris and Modern Music: A New Dataset and CNN Benchmark

In this paper, we create EMIR, the first-ever Music Information Retrieva...
research
10/31/2018

Deep Net Features for Complex Emotion Recognition

This paper investigates the influence of different acoustic features, au...
research
11/14/2022

Sentiment recognition of Italian elderly through domain adaptation on cross-corpus speech dataset

The aim of this work is to define a speech emotion recognition (SER) mod...

Please sign up or login with your details

Forgot password? Click here to reset