Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition
Human affective behavior analysis focuses on analyzing human expressions or other behaviors, which helps improve the understanding of human psychology. CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW) makes great efforts to provide the diversity data for the recognition of the commonly used emotion representations, including Action Units (AU), basic expression categories and Valence-Arousal (VA). In this paper, we introduce our submission to the CVPR 2023: ABAW5 for AU detection, expression classification, VA estimation and emotional reaction intensity (ERI) estimation. First of all, we introduce the vision information from an MAE model, which has been pre-trained on a large-scale face image dataset in a self-supervised manner. Then the MAE encoder part is finetuned on the ABAW challenges on the single frame of Aff-wild2 dataset. We also exploit the multi-modal and temporal information from the videos and design a transformer-based framework to fusion the multi-modal features. Moreover, we construct a novel two-branch collaboration training strategy to further enhance the model generalization by randomly interpolating the logits space. The extensive quantitative experiments, as well as ablation studies on the Aff-Wild2 dataset and Hume-Reaction dataset prove the effectiveness of our proposed method.
READ FULL TEXT