Prompting Visual-Language Models for Dynamic Facial Expression Recognition

08/25/2023
by   Zengqun Zhao, et al.
0

This paper presents a novel visual-language model called DFER-CLIP, which is based on the CLIP model and designed for in-the-wild Dynamic Facial Expression Recognition (DFER). Specifically, the proposed DFER-CLIP consists of a visual part and a textual part. For the visual part, based on the CLIP image encoder, a temporal model consisting of several Transformer encoders is introduced for extracting temporal facial expression features, and the final feature embedding is obtained as a learnable "class" token. For the textual part, we use as inputs textual descriptions of the facial behaviour that is related to the classes (facial expressions) that we are interested in recognising – those descriptions are generated using large language models, like ChatGPT. This, in contrast to works that use only the class names and more accurately captures the relationship between them. Alongside the textual description, we introduce a learnable token which helps the model learn relevant context information for each expression during training. Extensive experiments demonstrate the effectiveness of the proposed method and show that our DFER-CLIP also achieves state-of-the-art results compared with the current supervised DFER methods on the DFEW, FERV39k, and MAFW benchmarks. Code is publicly available at https://github.com/zengqunzhao/DFER-CLIP.

READ FULL TEXT

page 1

page 3

page 17

research
03/25/2022

Facial Expression Recognition with Swin Transformer

The task of recognizing human facial expressions plays a vital role in v...
research
03/01/2023

CLIPER: A Unified Vision-Language Framework for In-the-Wild Facial Expression Recognition

Facial expression recognition (FER) is an essential task for understandi...
research
10/30/2022

ViTASD: Robust Vision Transformer Baselines for Autism Spectrum Disorder Facial Diagnosis

Autism spectrum disorder (ASD) is a lifelong neurodevelopmental disorder...
research
08/07/2023

GaFET: Learning Geometry-aware Facial Expression Translation from In-The-Wild Images

While current face animation methods can manipulate expressions individu...
research
06/10/2022

NR-DFERNet: Noise-Robust Network for Dynamic Facial Expression Recognition

Dynamic facial expression recognition (DFER) in the wild is an extremely...
research
11/24/2022

More comprehensive facial inversion for more effective expression recognition

Facial expression recognition (FER) plays a significant role in the ubiq...
research
07/05/2023

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition

Dynamic facial expression recognition (DFER) is essential to the develop...

Please sign up or login with your details

Forgot password? Click here to reset