Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

10/08/2018
by   Takuya Yoshioka, et al.
0

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped. While speech overlaps have been regarded as a major obstacle in accurately transcribing meetings, a traditional beamformer with a single output has been exclusively used because previously proposed speech separation techniques have critical constraints for application to real meetings. This paper proposes a new signal processing module, called an unmixing transducer, and describes its implementation using a windowed BLSTM. The unmixing transducer has a fixed number, say J, of output channels, where J may be different from the number of meeting attendees, and transforms an input multi-channel acoustic signal into J time-synchronous audio streams. Each utterance in the meeting is separated and emitted from one of the output channels. Then, each output signal can be simply fed to a speech recognition back-end for segmentation and transcription. Our meeting transcription system using the unmixing transducer outperforms a system based on a state-of-the-art neural mask-based beamformer by 10.8 improvements are observed in overlapped segments. To the best of our knowledge, this is the first report that applies overlapped speech recognition to unconstrained real meeting audio.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2019

Low-Latency Speaker-Independent Continuous Speech Separation

Speaker independent continuous speech separation (SI-CSS) is a task of c...
research
03/28/2020

Serialized Output Training for End-to-End Overlapped Speech Recognition

This paper proposes serialized output training (SOT), a novel framework ...
research
04/05/2022

Audio-visual multi-channel speech separation, dereverberation and recognition

Despite the rapid advance of automatic speech recognition (ASR) technolo...
research
07/30/2021

Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers

Automatic transcription of meetings requires handling of overlapped spee...
research
02/16/2020

Real-time binaural speech separation with preserved spatial cues

Deep learning speech separation algorithms have achieved great success i...
research
01/30/2020

Continuous speech separation: dataset and analysis

This paper describes a dataset and protocols for evaluating continuous s...
research
10/28/2022

UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

This study presents UX-Net, a time-domain audio separation network (TasN...

Please sign up or login with your details

Forgot password? Click here to reset