Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings

08/11/2020
by   Naoyuki Kanda, et al.
0

Recently, an end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) model was proposed as a joint model of speaker counting, speech recognition and speaker identification for monaural overlapped speech. It showed promising results for simulated speech mixtures consisting of various numbers of speakers. However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model. In this paper, we extend the prior work by addressing the case where no speaker profile is available. Specifically, we perform speaker counting and clustering by using the internal speaker representations of the E2E SA-ASR model to diarize the utterances of the speakers whose profiles are missing from the speaker inventory. We also propose a simple modification to the reference labels of the E2E SA-ASR training which helps handle continuous multi-talker recordings well. We conduct a comprehensive investigation of the original E2E SA-ASR and the proposed method on the monaural LibriCSS dataset. Compared to the original E2E SA-ASR with relevant speaker profiles, the proposed method achieves a close performance without any prior speaker knowledge. We also show that the source-target attention in the E2E SA-ASR model provides information about the start and end times of the hypotheses.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2021

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

This paper presents Transcribe-to-Diarize, a new approach for neural spe...
research
01/06/2021

Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings

An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-...
research
11/05/2020

Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription

Joint optimization of multi-channel front-end and automatic speech recog...
research
11/03/2020

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

Recently, an end-to-end speaker-attributed automatic speech recognition ...
research
10/03/2017

Visual gesture variability between talkers in continuous visual speech

Recent adoption of deep learning methods to the field of machine lipread...
research
08/20/2020

Dyadic Speech-based Affect Recognition using DAMI-P2C Parent-child Multimodal Interaction Dataset

Automatic speech-based affect recognition of individuals in dyadic conve...
research
03/31/2022

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

In this paper, we conduct a comparative study on speaker-attributed auto...

Please sign up or login with your details

Forgot password? Click here to reset