Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

09/23/2021
by   Wei Xia, et al.
0

In this paper, we present a novel speaker diarization system for streaming on-device applications. In this system, we use a transformer transducer to detect the speaker turns, represent each speaker turn by a speaker embedding, then cluster these embeddings with constraints from the detected speaker turns. Compared with conventional clustering-based diarization systems, our system largely reduces the computational cost of clustering due to the sparsity of speaker turns. Unlike other supervised speaker diarization systems which require annotations of time-stamped speaker labels for training, our system only requires including speaker turn tokens during the transcribing process, which largely reduces the human efforts involved in data collection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2018

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

In this paper, we present a novel system that separates the voice of a t...
research
10/25/2022

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

While recent research advances in speaker diarization mostly focus on im...
research
05/14/2022

Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech

In this paper, we present a novel training method for speaker change det...
research
07/13/2022

Online Target Speaker Voice Activity Detection for Speaker Diarization

This paper proposes an online target speaker voice activity detection sy...
research
07/10/2017

Improving speaker turn embedding by crossmodal transfer learning from face embedding

Learning speaker turn embeddings has shown considerable improvement in s...
research
11/11/2022

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

In this work we propose a novel token-based training strategy that impro...
research
04/01/2022

Multimodal Clustering with Role Induced Constraints for Speaker Diarization

Speaker clustering is an essential step in conventional speaker diarizat...

Please sign up or login with your details

Forgot password? Click here to reset