Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

06/04/2020
by   Thilo von Neumann, et al.
0

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative speech extraction system with mechanisms to count the number of sources and combine it with a single-talker speech recognizer to form the first end-to-end multi-talker automatic speech recognition system for an unknown number of active speakers. Our experiments show very promising performance in counting accuracy, source separation and speech recognition on simulated clean mixtures from WSJ0-2mix and WSJ0-3mix. Among others, we set a new state-of-the-art word error rate on the WSJ0-2mix database. Furthermore, our system generalizes well to a larger number of speakers than it ever saw during training, as shown in experiments with the WSJ0-4mix database.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

End-to-End Multi-speaker ASR with Independent Vector Analysis

We develop an end-to-end system for multi-channel, multi-speaker automat...
research
11/24/2020

Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

We propose an end-to-end trainable approach to single-channel speech sep...
research
02/21/2019

All-neural online source separation, counting, and diarization for meeting analysis

Automatic meeting analysis comprises the tasks of speaker counting, spea...
research
10/12/2020

The Cone of Silence: Speech Separation by Localization

Given a multi-microphone recording of an unknown number of speakers talk...
research
10/13/2021

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices f...
research
02/16/2021

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Multi-source localization is an important and challenging technique for ...
research
03/30/2022

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

The vast majority of speech separation methods assume that the number of...

Please sign up or login with your details

Forgot password? Click here to reset