Fine-tuning wav2vec2 for speaker recognition

09/30/2021
by   Nik Vaessen, et al.
0

This paper explores applying the wav2vec2 framework to speaker recognition instead of speech recognition. We study the effectiveness of the pre-trained weights on the speaker recognition task, and how to pool the wav2vec2 output sequence into a fixed-length speaker embedding. To adapt the framework to speaker recognition, we propose a single-utterance classification variant with CE or AAM softmax loss, and an utterance-pair classification variant with BCE loss. Our best performing variant, w2v2-aam, achieves a 1.88 extended voxceleb1 test set compared to 1.69 Code is available at https://github.com/nikvaessen/w2v2-speaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Multi-task learning of speech and speaker recognition

We study multi-task learning for two orthogonal speech technology tasks:...
research
03/28/2022

Training speaker recognition systems with limited data

This work considers training neural networks for speaker recognition wit...
research
02/26/2019

Utterance-level Aggregation For Speaker Recognition In The Wild

The objective of this paper is speaker recognition "in the wild"-where u...
research
03/28/2022

Robust Speaker Recognition with Transformers Using wav2vec 2.0

Recent advances in unsupervised speech representation learning discover ...
research
10/31/2022

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

Speaker modeling is essential for many related tasks, such as speaker re...
research
10/28/2022

Speaker recognition with two-step multi-modal deep cleansing

Neural network-based speaker recognition has achieved significant improv...
research
09/17/2023

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

We propose a novel neural speaker diarization system using memory-aware ...

Please sign up or login with your details

Forgot password? Click here to reset