Weakly Supervised Training of Speaker Identification Models

06/22/2018
by   Martin Karu, et al.
0

We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the recording level. The method uses speaker diarization to find unique speakers in each recording, and i-vectors to project the speech of each speaker to a fixed-dimensional vector. A neural network is then trained to map i-vectors to speakers, using a special objective function that allows to optimize the model using recording-level speaker labels. We report experiments on two different real-world datasets. On the VoxCeleb dataset, the method provides 94.6 the baseline performance by a large margin. On an Estonian broadcast news dataset, the method provides 66 93

READ FULL TEXT
research
07/17/2018

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

The Multitarget Challenge aims to assess how well current speech technol...
research
07/01/2022

Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

Speaker identification in noisy audio recordings, specifically those fro...
research
01/11/2019

Advanced Rich Transcription System for Estonian Speech

This paper describes the current TTÜ speech transcription system for Est...
research
09/27/2016

Weakly Supervised PLDA Training

PLDA is a popular normalization approach for the i-vector model, and it ...
research
08/28/2022

Computing with Hypervectors for Efficient Speaker Identification

We introduce a method to identify speakers by computing with high-dimens...
research
05/15/2020

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is...
research
10/18/2022

Risk of re-identification for shared clinical speech recordings

Large, curated datasets are required to leverage speech-based tools in h...

Please sign up or login with your details

Forgot password? Click here to reset