VoxCeleb2: Deep Speaker Recognition

06/14/2018
by   Joon Son Chung, et al.
0

The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.

READ FULL TEXT
research
06/26/2017

VoxCeleb: a large-scale speaker identification dataset

Most existing datasets for speaker identification contain samples obtain...
research
10/31/2019

CN-CELEB: a challenging Chinese speaker recognition dataset

Recently, researchers set an ambitious goal of conducting speaker recogn...
research
08/14/2023

VoxBlink: A Large Scale Speaker Verification Dataset on Camera

In this paper, we introduce a large-scale and high-quality audio-visual ...
research
02/25/2023

Speaker Recognition in Realistic Scenario Using Multimodal Data

In recent years, an association is established between faces and voices ...
research
04/30/2022

Baselines and Protocols for Household Speaker Recognition

Speaker recognition on household devices, such as smart speakers, featur...
research
04/29/2020

VGGSound: A Large-scale Audio-Visual Dataset

Our goal is to collect a large-scale audio-visual dataset with low label...
research
01/24/2022

Bias in Automated Speaker Recognition

Automated speaker recognition uses data processing to identify speakers ...

Please sign up or login with your details

Forgot password? Click here to reset