S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens

by   Rizhao Cai, et al.

Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces. State-of-the-art FAS techniques predominantly rely on deep learning models but their cross-domain generalization capabilities are often hindered by the domain shift problem, which arises due to different distributions between training and testing data. In this study, we develop a generalized FAS method under the Efficient Parameter Transfer Learning (EPTL) paradigm, where we adapt the pre-trained Vision Transformer models for the FAS task. During training, the adapter modules are inserted into the pre-trained ViT model, and the adapters are updated while other pre-trained parameters remain fixed. We find the limitations of previous vanilla adapters in that they are based on linear layers, which lack a spoofing-aware inductive bias and thus restrict the cross-domain generalization. To address this limitation and achieve cross-domain generalized FAS, we propose a novel Statistical Adapter (S-Adapter) that gathers local discriminative and statistical information from localized token histograms. To further improve the generalization of the statistical tokens, we propose a novel Token Style Regularization (TSR), which aims to reduce domain style variance by regularizing Gram matrices extracted from tokens across different domains. Our experimental results demonstrate that our proposed S-Adapter and TSR provide significant benefits in both zero-shot and few-shot cross-domain testing, outperforming state-of-the-art methods on several benchmark tests. We will release the source code upon acceptance.


page 1

page 8

page 9


Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing

Audio anti-spoofing for automatic speaker verification aims to safeguard...

Cross-Domain Style Mixing for Face Cartoonization

Cartoon domain has recently gained increasing popularity. Previous studi...

MCAE: Masked Contrastive Autoencoder for Face Anti-Spoofing

Face anti-spoofing (FAS) method performs well under the intra-domain set...

Learning Meta Pattern for Face Anti-Spoofing

Face Anti-Spoofing (FAS) is essential to secure face recognition systems...

Forgery-aware Adaptive Vision Transformer for Face Forgery Detection

With the advancement in face manipulation technologies, the importance o...

TIER: Text-Image Entropy Regularization for CLIP-style models

In this paper, we study the effect of a novel regularization scheme on c...

StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

Large-scale foundation models (e.g., CLIP) have shown promising zero-sho...

Please sign up or login with your details

Forgot password? Click here to reset