Visual Prompt Flexible-Modal Face Anti-Spoofing

07/26/2023
by   Zitong Yu, et al.
0

Recently, vision transformer based multimodal learning methods have been proposed to improve the robustness of face anti-spoofing (FAS) systems. However, multimodal face data collected from the real world is often imperfect due to missing modalities from various imaging sensors. Recently, flexible-modal FAS <cit.> has attracted more attention, which aims to develop a unified multimodal FAS model using complete multimodal face data but is insensitive to test-time missing modalities. In this paper, we tackle one main challenge in flexible-modal FAS, i.e., when missing modality occurs either during training or testing in real-world situations. Inspired by the recent success of the prompt learning in language models, we propose Visual Prompt flexible-modal FAS (VP-FAS), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to downstream flexible-modal FAS task. Specifically, both vanilla visual prompts and residual contextual prompts are plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 4% learnable parameters compared to training the entire model. Furthermore, missing-modality regularization is proposed to force models to learn consistent multimodal feature embeddings when missing partial modalities. Extensive experiments conducted on two multimodal FAS benchmark datasets demonstrate the effectiveness of our VP-FAS framework that improves the performance under various missing-modality cases while alleviating the requirement of heavy model re-training.

READ FULL TEXT
research
03/06/2023

Multimodal Prompting with Missing Modalities for Visual Recognition

In this paper, we tackle two challenges in multimodal learning for visua...
research
02/11/2023

Flexible-modal Deception Detection with Audio-Visual Adapter

Detecting deception by human behaviors is vital in many fields such as c...
research
04/12/2022

Are Multimodal Transformers Robust to Missing Modality?

Multimodal data collected from the real world are often imperfect due to...
research
02/11/2023

Rethinking Vision Transformer and Masked Autoencoder in Multimodal Face Anti-Spoofing

Recently, vision transformer (ViT) based multimodal learning methods hav...
research
07/11/2022

Multiple-Modality Associative Memory: a framework for Learning

Drawing from memory the face of a friend you have not seen in years is a...
research
03/02/2021

Listen, Read, and Identify: Multimodal Singing Language Identification of Music

We propose a multimodal singing language classification model that uses ...
research
12/08/2021

Unimodal Face Classification with Multimodal Training

Face recognition is a crucial task in various multimedia applications su...

Please sign up or login with your details

Forgot password? Click here to reset