The JHU submission to VoxSRC-21: Track 3

09/28/2021
by   Jejin Cho, et al.
0

This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed). Our overall training process is similar to the proposed one from the first place team in the last year's VoxSRC2020 challenge. The main difference is a recently proposed non-contrastive self-supervised method in computer vision (CV), distillation with no labels (DINO), is used to train our initial model, which outperformed the last year's contrastive learning based on momentum contrast (MoCo). Also, this requires only a few iterations in the iterative clustering stage, where pseudo labels for supervised embedding learning are updated based on the clusters of the embeddings generated from a model that is continually fine-tuned over iterations. In the final stage, Res2Net50 is trained on the final pseudo labels from the iterative clustering stage. This is our best submitted model to the challenge, showing 1.89, 6.50, and 6.89 in EER( voxceleb1 test o, VoxSRC-21 validation, and test trials, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2020

An iterative framework for self-supervised deep speaker representation learning

In this paper, we propose an iterative framework for self-supervised spe...
research
09/05/2021

The Phonexia VoxCeleb Speaker Recognition Challenge 2021 System Description

We describe the Phonexia submission for the VoxCeleb Speaker Recognition...
research
08/10/2022

Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

Considering the abundance of unlabeled speech data and the high labeling...
research
09/07/2021

The DKU-DukeECE System for the Self-Supervision Speaker Verification Task of the 2021 VoxCeleb Speaker Recognition Challenge

This report describes the submission of the DKU-DukeECE team to the self...
research
08/15/2022

C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

Self-supervised learning (SSL) has drawn an increased attention in the f...
research
10/27/2022

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

We study a novel neural architecture and its training strategies of spea...
research
05/13/2023

A Two-Stage Real Image Deraining Method for GT-RAIN Challenge CVPR 2023 Workshop UG^2+ Track 3

In this technical report, we briefly introduce the solution of our team ...

Please sign up or login with your details

Forgot password? Click here to reset