Learning Audio-Visual embedding for Person Verification in the Wild

09/09/2022
by   Peiwen Sun, et al.
9

It has already been observed that audio-visual embedding is more robust than uni-modality embedding for person verification. Here, we proposed a novel audio-visual strategy that considers aggregators from a fusion perspective. First, we introduced weight-enhanced attentive statistics pooling for the first time in face verification. We find that a strong correlation exists between modalities during pooling, so joint attentive pooling is proposed which contains cycle consistency to learn the implicit inter-frame weight. Finally, each modality is fused with a gated attention mechanism to gain robust audio-visual embedding. All the proposed models are trained on the VoxCeleb2 dev dataset and the best system obtains 0.18 official trial lists of VoxCeleb1 respectively, which is to our knowledge the best-published results for person verification.

READ FULL TEXT

page 2

page 4

research
10/23/2021

A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data

In this paper, we study an approach to multimodal person verification us...
research
08/06/2020

Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition

Audio-visual information fusion enables a performance improvement in spe...
research
11/27/2018

Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion

In this paper, we present a multi-modal online person verification syste...
research
09/13/2023

Weakly-Supervised Multi-Task Learning for Audio-Visual Speaker Verification

In this paper, we present a methodology for achieving robust multimodal ...
research
07/02/2021

Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion

We propose an audio-visual spatial-temporal deep neural network with: (1...
research
10/20/2022

A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition

Utilizing the sensor characteristics of the audio, visible camera, and t...
research
04/06/2022

Audio-Visual Person-of-Interest DeepFake Detection

Face manipulation technology is advancing very rapidly, and new methods ...

Please sign up or login with your details

Forgot password? Click here to reset