Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

11/15/2022
by   Morgan Sandler, et al.
0

In this work, we study the hypothesis that speaker identity embeddings extracted from speech samples may be used for detection and classification of emotion. In particular, we show that emotions can be effectively identified by learning speaker identities by use of a 1-D Triplet Convolutional Neural Network (CNN) Global Style Token (GST) scheme (e.g., DeepTalk Network) and reusing the trained speaker recognition model weights to generate features in the emotion classification domain. The automatic speaker recognition (ASR) network is trained with VoxCeleb1, VoxCeleb2, and Librispeech datasets with a triplet training loss function using speaker identity labels. Using an Support Vector Machine (SVM) classifier, we map speaker identity embeddings into discrete emotion categories from the CREMA-D, IEMOCAP, and MSP-Podcast datasets. On the task of speech emotion detection, we obtain 80.8 acted emotion samples from CREMA-D, 81.2 in IEMOCAP, and 66.9 propose a novel two-stage hierarchical classifier (HC) approach which demonstrates +2 we seek to convey the importance of holistically modeling intra-user variation within audio samples

READ FULL TEXT
research
02/04/2020

Emotion Recognition Using Speaker Cues

This research aims at identifying the unknown emotion using speaker cues...
research
05/13/2023

Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

The accuracy of automated speaker recognition is negatively impacted by ...
research
02/24/2021

Triplet loss based embeddings for forensic speaker identification in Spanish

With the advent of digital technology, it is more common that committed ...
research
06/10/2020

Uniphore's submission to Fearless Steps Challenge Phase-2

We propose supervised systems for speech activity detection (SAD) and sp...
research
10/08/2021

Cognitive Coding of Speech

We propose an approach for cognitive coding of speech by unsupervised ex...
research
11/13/2020

Multi-Modal Emotion Detection with Transfer Learning

Automated emotion detection in speech is a challenging task due to the c...
research
01/02/2023

EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies

Vocal Bursts – short, non-speech vocalizations that convey emotions, suc...

Please sign up or login with your details

Forgot password? Click here to reset