A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

11/18/2021
by   Tom O'Malley, et al.
0

We present a frontend for improving robustness of automatic speech recognition (ASR), that jointly implements three modules within a single model: acoustic echo cancellation, speech enhancement, and speech separation. This is achieved by using a contextual enhancement neural network that can optionally make use of different types of side inputs: (1) a reference signal of the playback audio, which is necessary for echo cancellation; (2) a noise context, which is useful for speech enhancement; and (3) an embedding vector representing the voice characteristic of the target speaker of interest, which is not only critical in speech separation, but also helpful for echo cancellation and speech enhancement. We present detailed evaluations to show that the joint model performs almost as well as the task-specific models, and significantly reduces word error rate in noisy conditions even when using a large-scale state-of-the-art ASR model. Compared to the noisy baseline, the joint model reduces the word error rate in low signal-to-noise ratio conditions by at least 71 26 model performs within 10 dataset, and 3

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2022

A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

Recent work has shown that it is possible to train a single model to per...
research
05/20/2022

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

Acoustic echo cancellation (AEC) plays an important role in the full-dup...
research
10/23/2020

Speech enhancement aided end-to-end multi-task learning for voice activity detection

Robust voice activity detection (VAD) is a challenging task in low signa...
research
10/26/2018

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

We address the problem of speech enhancement generalisation to unseen en...
research
07/31/2020

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

A novel framework for meeting transcription using asynchronous microphon...
research
11/08/2022

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Personalised speech enhancement (PSE), which extracts only the speech of...
research
04/25/2022

Cleanformer: A microphone array configuration-invariant, streaming, multichannel neural enhancement frontend for ASR

This work introduces the Cleanformer, a streaming multichannel neural ba...

Please sign up or login with your details

Forgot password? Click here to reset