Large-Scale Speaker Diarization of Radio Broadcast Archives

06/19/2019
by   Emre Yılmaz, et al.
0

This paper describes our initial efforts to build a large-scale speaker diarization (SD) and identification system on a recently digitized radio broadcast archive from the Netherlands which has more than 6500 audio tapes with 3000 hours of Frisian-Dutch speech recorded between 1950-2016. The employed large-scale diarization scheme involves two stages: (1) tape-level speaker diarization providing pseudo-speaker identities and (2) speaker linking to relate pseudo-speakers appearing in multiple tapes. Having access to the speaker models of several frequently appearing speakers from the previously collected FAME! speech corpus, we further perform speaker identification by linking these known speakers to the pseudo-speakers identified at the first stage. In this work, we present a recently created longitudinal and multilingual SD corpus designed for large-scale SD research and evaluate the performance of a new speaker linking system using x-vectors with PLDA to quantify cross-tape speaker similarity on this corpus. The performance of this speaker linking system is evaluated on a small subset of the archive which is manually annotated with speaker information. The speaker linking performance reported on this subset (53 hours) and the whole archive (3000 hours) is compared to quantify the impact of scaling up in the amount of speech data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2023

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Disentangling uncorrelated information in speech utterances is a crucial...
research
03/26/2014

Constrained speaker linking

In this paper we study speaker linking (a.k.a. partitioning) given const...
research
10/23/2018

Semi-supervised acoustic model training for speech with code-switching

In the FAME! project, we aim to develop an automatic speech recognition ...
research
07/16/2019

RadioTalk: a large-scale corpus of talk radio transcripts

We introduce RadioTalk, a corpus of speech recognition transcripts sampl...
research
10/20/2022

Large-scale learning of generalised representations for speaker recognition

The objective of this work is to develop a speaker recognition model to ...
research
05/22/2020

Identify Speakers in Cocktail Parties with End-to-End Attention

In scenarios where multiple speakers talk at the same time, it is import...
research
05/18/2020

Design Choices for X-vector Based Speaker Anonymization

The recently proposed x-vector based anonymization scheme converts any i...

Please sign up or login with your details

Forgot password? Click here to reset