Speaker recognition with two-step multi-modal deep cleansing

10/28/2022
by   Ruijie Tao, et al.
0

Neural network-based speaker recognition has achieved significant improvement in recent years. A robust speaker representation learns meaningful knowledge from both hard and easy samples in the training set to achieve good performance. However, noisy samples (i.e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation. In this paper, we propose a two-step audio-visual deep cleansing framework to eliminate the effect of noisy labels in speaker representation learning. This framework contains a coarse-grained cleansing step to search for the peculiar samples, followed by a fine-grained cleansing step to filter out the noisy labels. Our study starts from an efficient audio-visual speaker recognition system, which achieves a close to perfect equal-error-rate (EER) of 0.01%, 0.07% and 0.13% on the Vox-O, E and H test sets. With the proposed multi-modal cleansing mechanism, four different speaker recognition networks achieve an average improvement of 5.9%. Code has been made available at: <https://github.com/TaoRuijie/AVCleanse>.

READ FULL TEXT
research
09/07/2021

The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge

This report describes the submission of the DKU-DukeECE team to the self...
research
08/05/2023

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

Training speaker-discriminative and robust speaker verification systems ...
research
09/30/2021

Fine-tuning wav2vec2 for speaker recognition

This paper explores applying the wav2vec2 framework to speaker recogniti...
research
11/07/2021

LiMuSE: Lightweight Multi-modal Speaker Extraction

The past several years have witnessed significant progress in modeling t...
research
10/20/2021

One-Step Abductive Multi-Target Learning with Diverse Noisy Samples

One-step abductive multi-target learning (OSAMTL) was proposed to handle...
research
09/17/2023

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

We propose a novel neural speaker diarization system using memory-aware ...
research
11/08/2022

BER: Balanced Error Rate For Speaker Diarization

DER is the primary metric to evaluate diarization performance while faci...

Please sign up or login with your details

Forgot password? Click here to reset