Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

11/03/2020
by   Desh Raj, et al.
0

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and automatic speech recognition (ASR) in the last decade, it has become possible to build pipelines that achieve reasonable error rates on this task. In this paper, we propose an end-to-end modular system for the LibriCSS meeting data, which combines independently trained separation, diarization, and recognition components, in that order. We study the effect of different state-of-the-art methods at each stage of the pipeline, and report results using task-specific metrics like SDR and DER, as well as downstream WER. Experiments indicate that the problem of overlapping speech for diarization and ASR can be effectively mitigated with the presence of a well-trained separation module. Our best system achieves a speaker-attributed WER of 12.7 close to that of a non-overlapping ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2018

End-to-End Monaural Multi-speaker ASR System without Pretraining

Recently, end-to-end models have become a popular approach as an alterna...
research
07/06/2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

Speaker-attributed automatic speech recognition (SA-ASR) is a task to re...
research
04/01/2021

Configurable Privacy-Preserving Automatic Speech Recognition

Voice assistive technologies have given rise to far-reaching privacy and...
research
10/30/2020

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

This paper proposes a new paradigm for handling far-field multi-speaker ...
research
11/11/2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Several trade-offs need to be balanced when employing monaural speech se...
research
09/15/2023

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Many real-life applications of automatic speech recognition (ASR) requir...
research
12/10/2021

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

Many of the recent advances in speech separation are primarily aimed at ...

Please sign up or login with your details

Forgot password? Click here to reset