Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

11/11/2022
by   Guanlong Zhao, et al.
0

In this work we propose a novel token-based training strategy that improves Transformer-Transducer (T-T) based speaker change detection (SCD) performance. The conventional T-T based SCD model loss optimizes all output tokens equally. Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy. To mitigate this issue, we use a customized edit-distance algorithm to estimate the token-level SCD false accept (FA) and false reject (FR) rates during training and optimize model parameters to minimize a weighted combination of the FA and FR, focusing the model on accurately predicting speaker changes. We also propose a set of evaluation metrics that align better with commercial use cases. Experiments on a group of challenging real-world datasets show that the proposed training method can significantly improve the overall performance of the SCD model with the same number of parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

The recently proposed serialized output training (SOT) simplifies multi-...
research
09/11/2023

SparseSwin: Swin Transformer with Sparse Transformer Block

Advancements in computer vision research have put transformer architectu...
research
11/17/2022

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

In multi-talker scenarios such as meetings and conversations, speech pro...
research
09/23/2021

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

In this paper, we present a novel speaker diarization system for streami...
research
05/28/2021

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neur...
research
05/14/2022

Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech

In this paper, we present a novel training method for speaker change det...
research
08/22/2020

UTMN at SemEval-2020 Task 11: A Kitchen Solution to Automatic Propaganda Detection

The article describes a fast solution to propaganda detection at SemEval...

Please sign up or login with your details

Forgot password? Click here to reset