Multimodal Clustering with Role Induced Constraints for Speaker Diarization

04/01/2022
by   Nikolaos Flemotomos, et al.
0

Speaker clustering is an essential step in conventional speaker diarization systems and is typically addressed as an audio-only speech processing task. The language used by the participants in a conversation, however, carries additional information that can help improve the clustering performance. This is especially true in conversational interactions, such as business meetings, interviews, and lectures, where specific roles assumed by interlocutors (manager, client, teacher, etc.) are often associated with distinguishable linguistic patterns. In this paper we propose to employ a supervised text-based model to extract speaker roles and then use this information to guide an audio-based spectral clustering step by imposing must-link and cannot-link constraints between segments. The proposed method is applied on two different domains, namely on medical interactions and on podcast episodes, and is shown to yield improved results when compared to the audio-only approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2019

Language Aided Speaker Diarization Using Speaker Role Information

Speaker diarization relies on the assumption that acoustic embeddings fr...
research
08/30/2019

Enhancements for Audio-only Diarization Systems

In this paper two different approaches to enhance the performance of the...
research
11/18/2019

Linguistically Aided Speaker Diarization Using Speaker Role Information

Speaker diarization relies on the assumption that speech segments corres...
research
09/10/2020

Speaker Diarization Using Stereo Audio Channels: Preliminary Study on Utterance Clustering

Speaker diarization is one of the actively researched topics in audio si...
research
07/23/2019

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

More and more neural network approaches have achieved considerable impro...
research
08/06/2021

The Right to Talk: An Audio-Visual Transformer Approach

Turn-taking has played an essential role in structuring the regulation o...
research
09/23/2021

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

In this paper, we present a novel speaker diarization system for streami...

Please sign up or login with your details

Forgot password? Click here to reset