DeepAI AI Chat
Log In Sign Up

Weakly Supervised Training of Speaker Identification Models

06/22/2018
by   Martin Karu, et al.
Tallinn University of Technology
0

We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the recording level. The method uses speaker diarization to find unique speakers in each recording, and i-vectors to project the speech of each speaker to a fixed-dimensional vector. A neural network is then trained to map i-vectors to speakers, using a special objective function that allows to optimize the model using recording-level speaker labels. We report experiments on two different real-world datasets. On the VoxCeleb dataset, the method provides 94.6 the baseline performance by a large margin. On an Estonian broadcast news dataset, the method provides 66 93

READ FULL TEXT
07/01/2022

Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

Speaker identification in noisy audio recordings, specifically those fro...
01/11/2019

Advanced Rich Transcription System for Estonian Speech

This paper describes the current TTÜ speech transcription system for Est...
09/27/2016

Weakly Supervised PLDA Training

PLDA is a popular normalization approach for the i-vector model, and it ...
08/28/2022

Computing with Hypervectors for Efficient Speaker Identification

We introduce a method to identify speakers by computing with high-dimens...
05/15/2020

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is...
10/18/2022

Risk of re-identification for shared clinical speech recordings

Large, curated datasets are required to leverage speech-based tools in h...