Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

05/12/2023
by   Lucas Goncalves, et al.
0

Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for either predicting emotional attributes or recognizing categorical emotions. Achieving such flexibility in a multimodal emotion recognition system is difficult due to the inherent challenges in accurately interpreting and integrating varied data sources. It is also a challenge to robustly handle missing or partial information while allowing direct switch between regression and classification tasks. This study proposes a versatile audio-visual learning (VAVL) framework for handling unimodal and multimodal systems for emotion regression and emotion classification tasks. We implement an audio-visual framework that can be trained even when audio and visual paired data is not available for part of the training set (i.e., audio only or only video is present). We achieve this effective representation learning with audio-visual shared layers, residual connections over shared layers, and a unimodal reconstruction task. Our experimental results reveal that our architecture significantly outperforms strong baselines on both the CREMA-D and MSP-IMPROV corpora. Notably, VAVL attains a new state-of-the-art performance in the emotional attribute prediction task on the MSP-IMPROV corpus. Code available at: https://github.com/ilucasgoncalves/VAVL

READ FULL TEXT
research
04/08/2023

An Empirical Study and Improvement for Speech Emotion Recognition

Multimodal speech emotion recognition aims to detect speakers' emotions ...
research
07/08/2020

Temporal aggregation of audio-visual modalities for emotion recognition

Emotion recognition has a pivotal role in affective computing and in hum...
research
06/27/2023

Explainable Multimodal Emotion Reasoning

Multimodal emotion recognition is an active research topic in artificial...
research
03/08/2023

Multimodal Multi-User Surface Recognition with the Kernel Two-Sample Test

Machine learning and deep learning have been used extensively to classif...
research
01/24/2022

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

The punctuation restoration task aims to correctly punctuate the output ...
research
03/28/2019

A Multimodal Emotion Sensing Platform for Building Emotion-Aware Applications

Humans use a host of signals to infer the emotional state of others. In ...
research
04/28/2018

Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes

Recognizing emotions using few attribute dimensions such as arousal, val...

Please sign up or login with your details

Forgot password? Click here to reset