Emotion Recognition in Speech using Cross-Modal Transfer in the Wild

08/16/2018
by   Samuel Albanie, et al.
8

Obtaining large, human labelled speech datasets to train models for emotion recognition is a notoriously challenging task, hindered by annotation cost and label ambiguity. In this work, we consider the task of learning embeddings for speech classification without access to any form of labelled audio. We base our approach on a simple hypothesis: that the emotional content of speech correlates with the facial expression of the speaker. By exploiting this relationship, we show that annotations of expression can be transferred from the visual domain (faces) to the speech domain (voices) through cross-modal distillation. We make the following contributions: (i) we develop a strong teacher network for facial emotion recognition that achieves the state of the art on a standard benchmark; (ii) we use the teacher to train a student, tabula rasa, to learn representations (embeddings) for speech emotion recognition without access to labelled audio data; and (iii) we show that the speech emotion embedding can be used for speech emotion recognition on external benchmark datasets. Code, models and data are available.

READ FULL TEXT

page 1

page 5

research
12/27/2020

Exploring Emotion Features and Fusion Strategies for Audio-Video Emotion Recognition

The audio-video based emotion recognition aims to classify a given video...
research
09/09/2023

Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations

We propose EmoDistill, a novel speech emotion recognition (SER) framewor...
research
04/05/2022

Learning Speech Emotion Representations in the Quaternion Domain

The modeling of human emotion expression in speech signals is an importa...
research
01/04/2018

A pairwise discriminative task for speech emotion recognition

Speech emotion recognition is an important task in human-machine interac...
research
10/28/2020

Generative Adversarial Networks in Human Emotion Synthesis:A Review

Synthesizing realistic data samples is of great value for both academic ...
research
08/29/2023

AI-Based Facial Emotion Recognition Solutions for Education: A Study of Teacher-User and Other Categories

Existing information on AI-based facial emotion recognition (FER) is not...
research
05/09/2023

An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild” Edge Applications

Unsupervised speech models are becoming ubiquitous in the speech and mac...

Please sign up or login with your details

Forgot password? Click here to reset