Learning Representations of Affect from Speech

11/15/2015
by   Eugene Laksana, et al.
0

There has been a lot of prior work on representation learning for speech recognition applications, but not much emphasis has been given to an investigation of effective representations of affect from speech, where the paralinguistic elements of speech are separated out from the verbal content. In this paper, we explore denoising autoencoders for learning paralinguistic attributes i.e. categorical and dimensional affective traits from speech. We show that the representations learnt by the bottleneck layer of the autoencoder are highly discriminative of activation intensity and at separating out negative valence (sadness and anger) from positive valence (happiness). We experiment with different input speech features (such as FFT and log-mel spectrograms with temporal context windows), and different autoencoder architectures (such as stacked and deep autoencoders). We also learn utterance specific representations by a combination of denoising autoencoders and BLSTM based recurrent autoencoders. Emotion classification is performed with the learnt temporal/dynamic representations to evaluate the quality of the representations. Experiments on a well-established real-life speech dataset (IEMOCAP) show that the learnt representations are comparable to state of the art feature extractors (such as voice quality features and MFCCs) and are competitive with state-of-the-art approaches at emotion and dimensional affect recognition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2021

Training Stacked Denoising Autoencoders for Representation Learning

We implement stacked denoising autoencoders, a class of neural networks ...
research
05/15/2020

A Novel Fusion of Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

Motivated by the attention mechanism of the human visual system and rece...
research
12/23/2017

Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study

Learning the latent representation of data in unsupervised fashion is a ...
research
08/15/2020

EigenEmo: Spectral Utterance Representation Using Dynamic Mode Decomposition for Speech Emotion Classification

Human emotional speech is, by its very nature, a variant signal. This re...
research
03/13/2020

End-to-end Recurrent Denoising Autoencoder Embeddings for Speaker Identification

Speech 'in-the-wild' is a handicap for speaker recognition systems due t...
research
02/12/2018

Recovering Loss to Followup Information Using Denoising Autoencoders

Loss to followup is a significant issue in healthcare and has serious co...
research
02/08/2011

On Nonparametric Guidance for Learning Autoencoder Representations

Unsupervised discovery of latent representations, in addition to being u...

Please sign up or login with your details

Forgot password? Click here to reset