Content-Aware Speaker Embeddings for Speaker Diarisation

02/12/2021
by   G. Sun, et al.
0

Recent speaker diarisation systems often convert variable length speech segments into fixed-length vector representations for speaker clustering, which are known as speaker embeddings. In this paper, the content-aware speaker embeddings (CASE) approach is proposed, which extends the input of the speaker classifier to include not only acoustic features but also their corresponding speech content, via phone, character, and word embeddings. Compared to alternative methods that leverage similar information, such as multitask or adversarial training, CASE factorises automatic speech recognition (ASR) from speaker recognition to focus on modelling speaker characteristics and correlations with the corresponding content units to derive more expressive representations. CASE is evaluated for speaker re-clustering with a realistic speaker diarisation setup using the AMI meeting transcription dataset, where the content information is obtained by performing ASR based on an automatic segmentation. Experimental results showed that CASE achieved a 17.8 speaker error rate reduction over conventional methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models

This paper investigates the use of target-speaker automatic speech recog...
research
07/25/2023

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

Smart devices serviced by large-scale AI models necessitates user data t...
research
08/04/2020

Intra-class variation reduction of speaker representation in disentanglement framework

In this paper, we propose an effective training strategy to ex-tract rob...
research
07/24/2021

Significance of Speaker Embeddings and Temporal Context for Depression Detection

Depression detection from speech has attracted a lot of attention in rec...
research
10/14/2021

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Many mispronunciation detection and diagnosis (MD D) research approach...
research
07/08/2022

Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting Transcription

Self-supervised-learning-based pre-trained models for speech data, such ...
research
01/11/2023

Improving And Analyzing Neural Speaker Embeddings for ASR

Neural speaker embeddings encode the speaker's speech characteristics th...

Please sign up or login with your details

Forgot password? Click here to reset