Using Auxiliary Tasks In Multimodal Fusion Of Wav2vec 2.0 And BERT For Multimodal Emotion Recognition

02/27/2023
by   Dekai Sun, et al.
0

The lack of data and the difficulty of multimodal fusion have always been challenges for multimodal emotion recognition (MER). In this paper, we propose to use pretrained models as upstream network, wav2vec 2.0 for audio modality and BERT for text modality, and finetune them in downstream task of MER to cope with the lack of data. For the difficulty of multimodal fusion, we use a K-layer multi-head attention mechanism as a downstream fusion module. Starting from the MER task itself, we design two auxiliary tasks to alleviate the insufficient fusion between modalities and guide the network to capture and align emotion-related features. Compared to the previous state-of-the-art models, we achieve a better performance by 78.42 79.71

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2020

Jointly Fine-Tuning "BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition

Multimodal emotion recognition from speech is an important area in affec...
research
07/11/2022

Multi-level Fusion of Wav2vec 2.0 and BERT for Multimodal Emotion Recognition

The research and applications of multimodal emotion recognition have bec...
research
01/26/2022

Self-attention fusion for audiovisual emotion recognition with incomplete data

In this paper, we consider the problem of multimodal data analysis with ...
research
09/03/2019

Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples

Key features of mental illnesses are reflected in speech. Our research f...
research
08/24/2022

Hybrid Fusion Based Interpretable Multimodal Emotion Recognition with Insufficient Labelled Data

This paper proposes a multimodal emotion recognition system, VIsual Spok...
research
09/15/2021

Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition

Automatic emotion recognition (AER) based on enriched multimodal inputs,...
research
12/13/2020

MSAF: Multimodal Split Attention Fusion

Multimodal learning mimics the reasoning process of the human multi-sens...

Please sign up or login with your details

Forgot password? Click here to reset