Large-scale learning of generalised representations for speaker recognition

10/20/2022
by   Jee-weon Jung, et al.
0

The objective of this work is to develop a speaker recognition model to be used in diverse scenarios. We hypothesise that two components should be adequately configured to build such a model. First, adequate architecture would be required. We explore several recent state-of-the-art models, including ECAPA-TDNN and MFA-Conformer, as well as other baselines. Second, a massive amount of data would be required. We investigate several new training data configurations combining a few existing datasets. The most extensive configuration includes over 87k speakers' 10.22k hours of speech. Four evaluation protocols are adopted to measure how the trained model performs in diverse scenarios. Through experiments, we find that MFA-Conformer with the least inductive bias generalises the best. We also show that training with proposed large data configurations gives better performance. A boost in generalisation is observed, where the average performance on four evaluation protocols improves by more than 20 these models' performances can improve even further when increasing capacity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2019

A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine Database

DeepMine is a speech database in Persian and English designed to build a...
research
11/15/2019

Independent and automatic evaluation of acoustic-to-articulatory inversion models

Reconstruction of articulatory trajectories from the acoustic speech sig...
research
08/16/2021

NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition

This document provides a brief description of the National Institute of ...
research
06/19/2019

Large-Scale Speaker Diarization of Radio Broadcast Archives

This paper describes our initial efforts to build a large-scale speaker ...
research
04/30/2022

Baselines and Protocols for Household Speaker Recognition

Speaker recognition on household devices, such as smart speakers, featur...
research
08/07/2020

SplitNN-driven Vertical Partitioning

In this work, we introduce SplitNN-driven Vertical Partitioning, a confi...

Please sign up or login with your details

Forgot password? Click here to reset