Improving Face Recognition from Caption Supervision with Multi-Granular Contextual Feature Aggregation

08/13/2023
by   Md Mahedi Hasan, et al.
0

We introduce caption-guided face recognition (CGFR) as a new framework to improve the performance of commercial-off-the-shelf (COTS) face recognition (FR) systems. In contrast to combining soft biometrics (eg., facial marks, gender, and age) with face images, in this work, we use facial descriptions provided by face examiners as a piece of auxiliary information. However, due to the heterogeneity of the modalities, improving the performance by directly fusing the textual and facial features is very challenging, as both lie in different embedding spaces. In this paper, we propose a contextual feature aggregation module (CFAM) that addresses this issue by effectively exploiting the fine-grained word-region interaction and global image-caption association. Specifically, CFAM adopts a self-attention and a cross-attention scheme for improving the intra-modality and inter-modality relationship between the image and textual features, respectively. Additionally, we design a textual feature refinement module (TFRM) that refines the textual features of the pre-trained BERT encoder by updating the contextual embeddings. This module enhances the discriminative power of textual features with a cross-modal projection loss and realigns the word and caption embeddings with visual features by incorporating a visual-semantic alignment loss. We implemented the proposed CGFR framework on two face recognition models (ArcFace and AdaFace) and evaluated its performance on the Multi-Modal CelebA-HQ dataset. Our framework significantly improves the performance of ArcFace in both 1:1 verification and 1:N identification protocol.

READ FULL TEXT
research
08/10/2020

Domain Private and Agnostic Feature for Modality Adaptive Face Recognition

Heterogeneous face recognition is a challenging task due to the large mo...
research
10/19/2022

CLIP-Driven Fine-grained Text-Image Person Re-identification

TIReID aims to retrieve the image corresponding to the given text query ...
research
11/29/2021

Heterogeneous Visible-Thermal and Visible-Infrared Face Recognition using Unit-Class Loss and Cross-Modality Discriminator

Visible-to-thermal face image matching is a challenging variate of cross...
research
05/06/2019

Fine-grained Attention-based Video Face Recognition

This paper aims to learn a compact representation of a video for video f...
research
09/20/2020

Dual-path CNN with Max Gated block for Text-Based Person Re-identification

Text-based person re-identification(Re-id) is an important task in video...
research
09/08/2017

Improving Heterogeneous Face Recognition with Conditional Adversarial Networks

Heterogeneous face recognition between color image and depth image is a ...
research
03/02/2020

Relational Deep Feature Learning for Heterogeneous Face Recognition

Heterogeneous Face Recognition (HFR) is a task that matches faces across...

Please sign up or login with your details

Forgot password? Click here to reset