Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

05/25/2023
by   Lingwei Meng, et al.
0

Multi-talker overlapped speech poses a significant challenge for speech recognition and diarization. Recent research indicated that these two tasks are inter-dependent and complementary, motivating us to explore a unified modeling method to address them in the context of overlapped speech. A recent study proposed a cost-effective method to convert a single-talker automatic speech recognition (ASR) system into a multi-talker one, by inserting a Sidecar separator into the frozen well-trained ASR model. Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters. The proposed method yields better ASR results compared to the baseline on LibriMix and LibriSpeechMix datasets. Moreover, without sophisticated customization on the diarization task, our method achieves acceptable diarization results on the two-speaker subset of CALLHOME with only a few adaptation steps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2023

Adaptation of Whisper models to child speech recognition

Automatic Speech Recognition (ASR) systems often struggle with transcrib...
research
05/13/2022

Unified Modeling of Multi-Domain Multi-Device ASR Systems

Modern Automatic Speech Recognition (ASR) systems often use a portfolio ...
research
02/20/2023

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

Although automatic speech recognition (ASR) can perform well in common n...
research
10/21/2022

Can Visual Context Improve Automatic Speech Recognition for an Embodied Agent?

The usage of automatic speech recognition (ASR) systems are becoming omn...
research
04/21/2022

Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition

Accent variability has posed a huge challenge to automatic speech recogn...
research
10/05/2016

Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models

A Pascal challenge entitled monaural multi-talker speech recognition was...
research
05/23/2023

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

Automatic speech recognition (ASR) systems play a key role in applicatio...

Please sign up or login with your details

Forgot password? Click here to reset