Multimodal End-to-End Sparse Model for Emotion Recognition

03/17/2021
by   Wenliang Dai, et al.
16

Existing works on multimodal affective computing tasks, such as emotion recognition, generally adopt a two-phase pipeline, first extracting feature representations for each single modality with hand-crafted algorithms and then performing end-to-end learning with the extracted features. However, the extracted features are fixed and cannot be further fine-tuned on different target tasks, and manually finding feature extraction algorithms does not generalize or scale well to different tasks, which can lead to sub-optimal performance. In this paper, we develop a fully end-to-end model that connects the two phases and optimizes them jointly. In addition, we restructure the current datasets to enable the fully end-to-end training. Furthermore, to reduce the computational overhead brought by the end-to-end model, we introduce a sparse cross-modal attention mechanism for the feature extraction. Experimental results show that our fully end-to-end model significantly surpasses the current state-of-the-art models based on the two-phase pipeline. Moreover, by adding the sparse cross-modal attention, our model can maintain performance with around half the computation in the feature extraction part.

READ FULL TEXT

page 1

page 7

page 12

research
11/10/2021

Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention

Classifying group-level emotions is a challenging task due to complexity...
research
12/03/2021

LMR-CBT: Learning Modality-fused Representations with CB-Transformer for Multimodal Emotion Recognition from Unaligned Multimodal Sequences

Learning modality-fused representations and processing unaligned multimo...
research
10/07/2020

RealSmileNet: A Deep End-To-End Network for Spontaneous and Posed Smile Recognition

Smiles play a vital role in the understanding of social interactions wit...
research
03/01/2023

Feature Extraction Matters More: Universal Deepfake Disruption through Attacking Ensemble Feature Extractors

Adversarial example is a rising way of protecting facial privacy securit...
research
03/24/2022

Continuous-Time Audiovisual Fusion with Recurrence vs. Attention for In-The-Wild Affect Recognition

In this paper, we present our submission to 3rd Affective Behavior Analy...
research
04/03/2017

Sparse Autoencoder for Unsupervised Nucleus Detection and Representation in Histopathology Images

Histopathology images are crucial to the study of complex diseases such ...
research
08/15/2019

Feature-Less End-to-End Nested Term Extraction

In this paper, we proposed a deep learning-based end-to-end method on th...

Please sign up or login with your details

Forgot password? Click here to reset