Learning Representations of Emotional Speech with Deep Convolutional Generative Adversarial Networks

04/22/2017
by   Jonathan Chang, et al.
0

Automatically assessing emotional valence in human speech has historically been a difficult task for machine learning algorithms. The subtle changes in the voice of the speaker that are indicative of positive or negative emotional states are often "overshadowed" by voice characteristics relating to emotional intensity or emotional activation. In this work we explore a representation learning approach that automatically derives discriminative representations of emotional speech. In particular, we investigate two machine learning strategies to improve classifier performance: (1) utilization of unlabeled data using a deep convolutional generative adversarial network (DCGAN), and (2) multitask learning. Within our extensive experiments we leverage a multitask annotated emotional corpus as well as a large unlabeled meeting corpus (around 100 hours). Our speaker-independent classification experiments show that in particular the use of unlabeled data in our investigations improves performance of the classifiers and both fully supervised baseline approaches are outperformed considerably. We improve the classification of emotional valence on a discrete 5-point scale to 43.88 is competitive to state-of-the-art performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2020

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

Emotional voice conversion aims to transform emotional prosody in speech...
research
10/26/2022

Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0

Self-supervised learning approaches have lately achieved great success o...
research
10/04/2021

Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks

Emotional voice conversion (VC) aims to convert a neutral voice to an em...
research
11/11/2019

Emotional Voice Conversion using multitask learning with Text-to-speech

Voice conversion (VC) is a task to transform a person's voice to differe...
research
09/01/2010

Emotional State Categorization from Speech: Machine vs. Human

This paper presents our investigations on emotional state categorization...
research
04/08/2019

Completely Unsupervised Phoneme Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

Producing a large annotated speech corpus for training ASR systems remai...
research
09/06/2022

Machine Learning For Classification Of Antithetical Emotional States

Emotion Classification through EEG signals has achieved many advancement...

Please sign up or login with your details

Forgot password? Click here to reset