Training Strategies to Handle Missing Modalities for Audio-Visual Expression Recognition

10/02/2020
by   Srinivas Parthasarathy, et al.
0

Automatic audio-visual expression recognition can play an important role in communication services such as tele-health, VOIP calls and human-machine interaction. Accuracy of audio-visual expression recognition could benefit from the interplay between the two modalities. However, most audio-visual expression recognition systems, trained in ideal conditions, fail to generalize in real world scenarios where either the audio or visual modality could be missing due to a number of reasons such as limited bandwidth, interactors' orientation, caller initiated muting. This paper studies the performance of a state-of-the art transformer when one of the modalities is missing. We conduct ablation studies to evaluate the model in the absence of either modality. Further, we propose a strategy to randomly ablate visual inputs during training at the clip or frame level to mimic real world scenarios. Results conducted on in-the-wild data, indicate significant generalization in proposed models trained on missing cues, with gains up to 17 training strategies cope better with the loss of input modalities.

READ FULL TEXT

page 3

page 4

research
02/11/2023

Flexible-modal Deception Detection with Audio-Visual Adapter

Detecting deception by human behaviors is vital in many fields such as c...
research
03/06/2023

Multimodal Prompting with Missing Modalities for Visual Recognition

In this paper, we tackle two challenges in multimodal learning for visua...
research
11/22/2017

CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation

Visual and audio modalities are two symbiotic modalities underlying vide...
research
07/31/2022

Towards Intercultural Affect Recognition: Audio-Visual Affect Recognition in the Wild Across Six Cultures

In our multicultural world, affect-aware AI systems that support humans ...
research
11/30/2020

Detecting expressions with multimodal transformers

Developing machine learning algorithms to understand person-to-person en...
research
11/13/2015

Symbol Grounding Association in Multimodal Sequences with Missing Elements

In this paper, we extend a symbolic association framework for being able...
research
11/02/2022

Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild

Laughter is considered one of the most overt signals of joy. Laughter is...

Please sign up or login with your details

Forgot password? Click here to reset