Grouped Knowledge Distillation for Deep Face Recognition

04/10/2023
by   Weisong Zhao, et al.
0

Compared with the feature-based distillation methods, logits distillation can liberalize the requirements of consistent feature dimension between teacher and student networks, while the performance is deemed inferior in face recognition. One major challenge is that the light-weight student network has difficulty fitting the target logits due to its low model capacity, which is attributed to the significant number of identities in face recognition. Therefore, we seek to probe the target logits to extract the primary knowledge related to face identity, and discard the others, to make the distillation more achievable for the student network. Specifically, there is a tail group with near-zero values in the prediction, containing minor knowledge for distillation. To provide a clear perspective of its impact, we first partition the logits into two groups, i.e., Primary Group and Secondary Group, according to the cumulative probability of the softened prediction. Then, we reorganize the Knowledge Distillation (KD) loss of grouped logits into three parts, i.e., Primary-KD, Secondary-KD, and Binary-KD. Primary-KD refers to distilling the primary knowledge from the teacher, Secondary-KD aims to refine minor knowledge but increases the difficulty of distillation, and Binary-KD ensures the consistency of knowledge distribution between teacher and student. We experimentally found that (1) Primary-KD and Binary-KD are indispensable for KD, and (2) Secondary-KD is the culprit restricting KD at the bottleneck. Therefore, we propose a Grouped Knowledge Distillation (GKD) that retains the Primary-KD and Binary-KD but omits Secondary-KD in the ultimate KD loss calculation. Extensive experimental results on popular face recognition benchmarks demonstrate the superiority of proposed GKD over state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2022

CoupleFace: Relation Matters for Face Recognition Distillation

Knowledge distillation is an effective method to improve the performance...
research
06/06/2022

Evaluation-oriented Knowledge Distillation for Deep Face Recognition

Knowledge distillation (KD) is a widely-used technique that utilizes lar...
research
10/31/2020

ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition

Knowledge Distillation (KD) refers to transferring knowledge from a larg...
research
09/29/2022

Teaching Where to Look: Attention Similarity Knowledge Distillation for Low Resolution Face Recognition

Deep learning has achieved outstanding performance for face recognition ...
research
06/26/2023

Cross Architecture Distillation for Face Recognition

Transformers have emerged as the superior choice for face recognition ta...
research
12/17/2021

Distill and De-bias: Mitigating Bias in Face Recognition using Knowledge Distillation

Face recognition networks generally demonstrate bias with respect to sen...
research
02/10/2020

Distribution Distillation Loss: Generic Approach for Improving Face Recognition from Hard Samples

Large facial variations are the main challenge in face recognition. To t...

Please sign up or login with your details

Forgot password? Click here to reset