All-neural beamformer for continuous speech separation

10/13/2021
by   Zhuohuang Zhang, et al.
0

Continuous speech separation (CSS) aims to separate overlapping voices from a continuous influx of conversational audio containing an unknown number of utterances spoken by an unknown number of speakers. A common application scenario is transcribing a meeting conversation recorded by a microphone array. Prior studies explored various deep learning models for time-frequency mask estimation, followed by a minimum variance distortionless response (MVDR) filter to improve the automatic speech recognition (ASR) accuracy. The performance of these methods is fundamentally upper-bounded by MVDR's spatial selectivity. Recently, the all deep learning MVDR (ADL-MVDR) model was proposed for neural beamforming and demonstrated superior performance in a target speech extraction task using pre-segmented input. In this paper, we further adapt ADL-MVDR to the CSS task with several enhancements to enable end-to-end neural beamforming. The proposed system achieves significant word error rate reduction over a baseline spectral masking system on the LibriCSS dataset. Moreover, the proposed neural beamformer is shown to be comparable to a state-of-the-art MVDR-based system in real meeting transcription tasks, including AMI, while showing potentials to further simplify the runtime implementation and reduce the system latency with frame-wise processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Neural speech separation has made remarkable progress and its integratio...
research
06/04/2020

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Most approaches to multi-talker overlapped speech separation and recogni...
research
01/30/2020

Continuous speech separation: dataset and analysis

This paper describes a dataset and protocols for evaluating continuous s...
research
06/28/2023

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Recently, deep learning-based beamforming algorithms have shown promisin...
research
01/04/2021

Generalized RNN beamformer for target speech separation

Recently we proposed an all-deep-learning minimum variance distortionles...
research
08/30/2023

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Neural beamformers, which integrate both pre-separation and beamforming ...
research
09/15/2023

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Many real-life applications of automatic speech recognition (ASR) requir...

Please sign up or login with your details

Forgot password? Click here to reset