VoxCeleb: a large-scale speaker identification dataset

06/26/2017
by   Arsha Nagrani, et al.
0

Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected 'in the wild'. We make two contributions. First, we propose a fully automated pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the identity of the speaker using CNN based facial recognition. We use this pipeline to curate VoxCeleb which contains hundreds of thousands of 'real world' utterances for over 1,000 celebrities. Our second contribution is to apply and compare various state of the art speaker identification techniques on our dataset to establish baseline performance. We show that a CNN based architecture obtains the best performance for both identification and verification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2018

VoxCeleb2: Deep Speaker Recognition

The objective of this paper is speaker recognition under noisy and uncon...
research
04/29/2020

VGGSound: A Large-scale Audio-Visual Dataset

Our goal is to collect a large-scale audio-visual dataset with low label...
research
11/18/2022

SeaTurtleID: A novel long-span dataset highlighting the importance of timestamps in wildlife re-identification

This paper introduces SeaTurtleID, the first public large-scale, long-sp...
research
10/31/2019

CN-CELEB: a challenging Chinese speaker recognition dataset

Recently, researchers set an ambitious goal of conducting speaker recogn...
research
05/14/2020

Large Scale Font Independent Urdu Text Recognition System

OCR algorithms have received a significant improvement in performance re...
research
06/25/2019

Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)

This report describes our submission to the ActivityNet Challenge at CVP...
research
08/14/2023

VoxBlink: A Large Scale Speaker Verification Dataset on Camera

In this paper, we introduce a large-scale and high-quality audio-visual ...

Please sign up or login with your details

Forgot password? Click here to reset