Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech

05/14/2022
by   Joonas Kalda, et al.
0

In this paper, we present a novel training method for speaker change detection models. Speaker change detection is often viewed as a binary sequence labelling problem. The main challenges with this approach are the vagueness of annotated change points caused by the silences between speaker turns and imbalanced data due to the majority of frames not including a speaker change. Conventional training methods tackle these by artificially increasing the proportion of positive labels in the training data. Instead, the proposed method uses an objective function which encourages the model to predict a single positive label within a specified collar. This is done by marginalizing over all possible subsequences that have exactly one positive label within the collar. Experiments on English and Estonian datasets show large improvements over the conventional training method. Additionally, the model outputs have peaks concentrated to a single frame, removing the need for post-processing to find the exact predicted change point which is particularly useful for streaming applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2022

Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire

Speaker change detection is an important task in multi-party interaction...
research
09/23/2021

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

In this paper, we present a novel speaker diarization system for streami...
research
04/08/2021

End-to-end speaker segmentation for overlap-aware resegmentation

Speaker segmentation consists in partitioning a conversation between one...
research
05/08/2012

A Novel Method For Speech Segmentation Based On Speakers' Characteristics

Speech Segmentation is the process change point detection for partitioni...
research
03/05/2022

Language vs Speaker Change: A Comparative Study

Spoken language change detection (LCD) refers to detecting language swit...
research
11/14/2022

Multi-Label Training for Text-Independent Speaker Identification

In this paper, we propose a novel strategy for text-independent speaker ...
research
11/11/2022

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

In this work we propose a novel token-based training strategy that impro...

Please sign up or login with your details

Forgot password? Click here to reset