Cross Architecture Distillation for Face Recognition

06/26/2023
by   Weisong Zhao, et al.
0

Transformers have emerged as the superior choice for face recognition tasks, but their insufficient platform acceleration hinders their application on mobile devices. In contrast, Convolutional Neural Networks (CNNs) capitalize on hardware-compatible acceleration libraries. Consequently, it has become indispensable to preserve the distillation efficacy when transferring knowledge from a Transformer-based teacher model to a CNN-based student model, known as Cross-Architecture Knowledge Distillation (CAKD). Despite its potential, the deployment of CAKD in face recognition encounters two challenges: 1) the teacher and student share disparate spatial information for each pixel, obstructing the alignment of feature space, and 2) the teacher network is not trained in the role of a teacher, lacking proficiency in handling distillation-specific knowledge. To surmount these two constraints, 1) we first introduce a Unified Receptive Fields Mapping module (URFM) that maps pixel features of the teacher and student into local features with unified receptive fields, thereby synchronizing the pixel-wise spatial information of teacher and student. Subsequently, 2) we develop an Adaptable Prompting Teacher network (APT) that integrates prompts into the teacher, enabling it to manage distillation-specific knowledge while preserving the model's discriminative capacity. Extensive experiments on popular face benchmarks and two large-scale verification sets demonstrate the superiority of our method.

READ FULL TEXT

page 4

page 8

research
10/31/2020

ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition

Knowledge Distillation (KD) refers to transferring knowledge from a larg...
research
09/09/2017

Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

Knowledge distillation is a potential solution for model compression. Th...
research
06/06/2022

Evaluation-oriented Knowledge Distillation for Deep Face Recognition

Knowledge distillation (KD) is a widely-used technique that utilizes lar...
research
04/10/2023

Grouped Knowledge Distillation for Deep Face Recognition

Compared with the feature-based distillation methods, logits distillatio...
research
03/05/2020

MarginDistillation: distillation for margin-based softmax

The usage of convolutional neural networks (CNNs) in conjunction with a ...
research
06/03/2019

Deep Face Recognition Model Compression via Knowledge Transfer and Distillation

Fully convolutional networks (FCNs) have become de facto tool to achieve...
research
02/10/2020

Distribution Distillation Loss: Generic Approach for Improving Face Recognition from Hard Samples

Large facial variations are the main challenge in face recognition. To t...

Please sign up or login with your details

Forgot password? Click here to reset