Multimodal Continuous Emotion Recognition: A Technical Report for ABAW5

03/18/2023
by   Su Zhang, et al.
0

We used two multimodal models for continuous valence-arousal recognition using visual, audio, and linguistic information. The first model is the same as we used in ABAW2 and ABAW3, which employs the leader-follower attention. The second model has the same architecture for spatial and temporal encoding. As for the fusion block, it employs a compact and straightforward channel attention, borrowed from the End2You toolkit. Unlike our previous attempts that use Vggish feature directly as the audio feature, this time we feed the pre-trained VGG model using logmel-spectrogram and finetune it during the training. To make full use of the data and alleviate over-fitting, cross-validation is carried out. The code is available at https://github.com/sucv/ABAW3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Continuous Emotion Recognition using Visual-audio-linguistic information: A Technical Report for ABAW3

We propose a cross-modal co-attention model for continuous emotion recog...
research
07/02/2021

Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion

We propose an audio-visual spatial-temporal deep neural network with: (1...
research
09/25/2022

Multimodal Exponentially Modified Gaussian Oscillators

Acoustic modeling serves audio processing tasks such as de-noising, data...
research
03/18/2023

Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

The continuous improvement of human-computer interaction technology make...
research
02/26/2023

Mingling or Misalignment? Temporal Shift for Speech Emotion Recognition with Pre-trained Representations

Fueled by recent advances of self-supervised models, pre-trained speech ...
research
05/09/2023

Emolysis: A Multimodal Open-Source Group Emotion Analysis and Visualization Toolkit

Automatic group emotion recognition plays an important role in understan...
research
03/24/2022

Continuous-Time Audiovisual Fusion with Recurrence vs. Attention for In-The-Wild Affect Recognition

In this paper, we present our submission to 3rd Affective Behavior Analy...

Please sign up or login with your details

Forgot password? Click here to reset