Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

10/12/2021
by   Zhengyang Chen, et al.
0

The speech representations learned from large-scale unlabeled data have shown better generalizability than those from supervised learning and thus attract a lot of interest to be applied for various downstream tasks. In this paper, we explore the limits of speech representations learned by different self-supervised objectives and datasets for automatic speaker verification (ASV), especially with a well-recognized SOTA ASV model, ECAPA-TDNN [1], as a downstream model. The representations from all hidden layers of the pre-trained model are firstly averaged with learnable weights and then fed into the ECAPA-TDNN as input features. The experimental results on Voxceleb dataset show that the weighted average representation is significantly superior to FBank, a conventional handcrafted feature for ASV. Our best single system achieves 0.564 of VoxCeleb1, separately. Accordingly, the ensemble system with three pre-trained models can further improve the EER to 0.431 Among the three evaluation trials, our best system outperforms the winner system [2] of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC2021) on the VoxCeleb1-E trial.

READ FULL TEXT
research
05/23/2023

On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

Large self-supervised pre-trained speech models have achieved remarkable...
research
09/22/2022

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022

In this report, we describe our submitted system for track 2 of the VoxC...
research
05/16/2022

PRISM: Pre-trained Indeterminate Speaker Representation Model for Speaker Diarization and Speaker Verification

Speaker embedding has been a fundamental feature for speaker-related tas...
research
07/25/2023

Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods

This thesis focuses on representation learning for sequence data over ti...
research
06/13/2023

Efficient Adapters for Giant Speech Models

Large pre-trained speech models are widely used as the de-facto paradigm...
research
04/05/2019

An Unsupervised Autoregressive Model for Speech Representation Learning

This paper proposes a novel unsupervised autoregressive neural model for...
research
10/28/2022

Laugh Betrays You? Learning Robust Speaker Representation From Speech Containing Non-Verbal Fragments

The success of automatic speaker verification shows that discriminative ...

Please sign up or login with your details

Forgot password? Click here to reset