COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

06/12/2022
by   Mani Kumar Tellamekala, et al.
0

Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware audiovisual fusion approach that quantifies modality-wise uncertainty towards emotion prediction. To this end, we propose a novel fusion framework in which we first learn latent distributions over audiovisual temporal context vectors separately, and then constrain the variance vectors of unimodal latent distributions so that they represent the amount of information each modality provides w.r.t. emotion recognition. In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions may differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across the modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. In both classification and regression settings, we compare our uncertainty-aware fusion model with standard model-agnostic fusion baselines. Our evaluation on two emotion recognition corpora, AVEC 2019 CES and IEMOCAP, shows that audiovisual emotion recognition can considerably benefit from well-calibrated and well-ranked latent uncertainty measures.

READ FULL TEXT

page 1

page 2

page 6

page 10

page 13

research
11/09/2022

Distribution-based Emotion Recognition in Conversation

Automatic emotion recognition in conversation (ERC) is crucial for emoti...
research
08/12/2018

Multimodal Local-Global Ranking Fusion for Emotion Recognition

Emotion recognition is a core research area at the intersection of artif...
research
08/25/2022

Interpretable Multimodal Emotion Recognition using Hybrid Fusion of Speech and Image Data

This paper proposes a multimodal emotion recognition system based on hyb...
research
08/13/2019

Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis

Multimodal signals are more powerful than unimodal data for emotion reco...
research
01/26/2022

Self-attention fusion for audiovisual emotion recognition with incomplete data

In this paper, we consider the problem of multimodal data analysis with ...
research
03/08/2022

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

Emotion recognition is a key attribute for artificial intelligence syste...
research
06/11/2023

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression

In automatic emotion recognition (AER), labels assigned by different hum...

Please sign up or login with your details

Forgot password? Click here to reset