ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

11/07/2020
by   Chenda Li, et al.
0

We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets.

READ FULL TEXT
research
07/19/2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

This paper presents recent progress on integrating speech separation and...
research
11/27/2018

Improved Speech Enhancement with the Wave-U-Net

We study the use of the Wave-U-Net architecture for speech enhancement, ...
research
12/23/2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

This paper describes the recent development of ESPnet (https://github.co...
research
11/25/2022

Stereo Speech Enhancement Using Custom Mid-Side Signals and Monaural Processing

Speech Enhancement (SE) systems typically operate on monaural input and ...
research
11/03/2019

Onssen: an open-source speech separation and enhancement library

Speech separation is an essential task for multi-talker speech recogniti...
research
11/06/2018

SDR - half-baked or well done?

In speech enhancement and source separation, signal-to-noise ratio is a ...
research
07/31/2020

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

A novel framework for meeting transcription using asynchronous microphon...

Please sign up or login with your details

Forgot password? Click here to reset