DeepAI AI Chat
Log In Sign Up

Weakly Supervised Training of Speaker Identification Models

by   Martin Karu, et al.
Tallinn University of Technology

We propose an approach for training speaker identification models in a weakly supervised manner. We concentrate on the setting where the training data consists of a set of audio recordings and the speaker annotation is provided only at the recording level. The method uses speaker diarization to find unique speakers in each recording, and i-vectors to project the speech of each speaker to a fixed-dimensional vector. A neural network is then trained to map i-vectors to speakers, using a special objective function that allows to optimize the model using recording-level speaker labels. We report experiments on two different real-world datasets. On the VoxCeleb dataset, the method provides 94.6 the baseline performance by a large margin. On an Estonian broadcast news dataset, the method provides 66 93


Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

Speaker identification in noisy audio recordings, specifically those fro...

Advanced Rich Transcription System for Estonian Speech

This paper describes the current TTÜ speech transcription system for Est...

Weakly Supervised PLDA Training

PLDA is a popular normalization approach for the i-vector model, and it ...

Computing with Hypervectors for Efficient Speaker Identification

We introduce a method to identify speakers by computing with high-dimens...

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Identifying multiple speakers without knowing where a speaker's voice is...

Risk of re-identification for shared clinical speech recordings

Large, curated datasets are required to leverage speech-based tools in h...