Noise-Resistant Multimodal Transformer for Emotion Recognition

05/04/2023
by   Yuanyuan Liu, et al.
0

Multimodal emotion recognition identifies human emotions from various data modalities like video, text, and audio. However, we found that this task can be easily affected by noisy information that does not contain useful semantics. To this end, we present a novel paradigm that attempts to extract noise-resistant features in its pipeline and introduces a noise-aware learning scheme to effectively improve the robustness of multimodal emotion understanding. Our new pipeline, namely Noise-Resistant Multimodal Transformer (NORM-TR), mainly introduces a Noise-Resistant Generic Feature (NRGF) extractor and a Transformer for the multimodal emotion recognition task. In particular, we make the NRGF extractor learn a generic and disturbance-insensitive representation so that consistent and meaningful semantics can be obtained. Furthermore, we apply a Transformer to incorporate Multimodal Features (MFs) of multimodal inputs based on their relations to the NRGF. Therefore, the possible insensitive but useful information of NRGF could be complemented by MFs that contain more details. To train the NORM-TR properly, our proposed noise-aware learning scheme complements normal emotion recognition losses by enhancing the learning against noises. Our learning scheme explicitly adds noises to either all the modalities or a specific modality at random locations of a multimodal input sequence. We correspondingly introduce two adversarial losses to encourage the NRGF extractor to learn to extract the NRGFs invariant to the added noises, thus facilitating the NORM-TR to achieve more favorable multimodal emotion recognition performance. In practice, on several popular multimodal datasets, our NORM-TR achieves state-of-the-art performance and outperforms existing methods by a large margin, which demonstrates that the ability to resist noisy information is important for effective emotion recognition.

READ FULL TEXT

page 2

page 8

page 16

research
12/22/2022

Emotion Recognition with Pre-Trained Transformers Using Multimodal Signals

In this paper, we address the problem of multimodal emotion recognition ...
research
12/03/2021

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

Emotion Recognition in Conversations (ERC) is an important and active re...
research
08/13/2019

Multimodal Emotion Recognition Using Deep Canonical Correlation Analysis

Multimodal signals are more powerful than unimodal data for emotion reco...
research
06/22/2021

Key-Sparse Transformer with Cascaded Cross-Attention Block for Multimodal Speech Emotion Recognition

Speech emotion recognition is a challenging and important research topic...
research
10/29/2019

Privacy Enhanced Multimodal Neural Representations for Emotion Recognition

Many mobile applications and virtual conversational agents now aim to re...
research
04/28/2020

Deep Auto-Encoders with Sequential Learning for Multimodal Dimensional Emotion Recognition

Multimodal dimensional emotion recognition has drawn a great attention f...
research
07/06/2023

SeLiNet: Sentiment enriched Lightweight Network for Emotion Recognition in Images

In this paper, we propose a sentiment-enriched lightweight network SeLiN...

Please sign up or login with your details

Forgot password? Click here to reset