The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

12/23/2020
by   Shinji Watanabe, et al.
0

This paper describes the recent development of ESPnet (https://github.com/espnet/espnet), an end-to-end speech processing toolkit. This project was initiated in December 2017 to mainly deal with end-to-end speech recognition experiments based on sequence-to-sequence modeling. The project has grown rapidly and now covers a wide range of speech processing applications. Now ESPnet also includes text to speech (TTS), voice conversation (VC), speech translation (ST), and speech enhancement (SE) with support for beamforming, speech separation, denoising, and dereverberation. All applications are trained in an end-to-end manner, thanks to the generic sequence to sequence modeling properties, and they can be further integrated and jointly optimized. Also, ESPnet provides reproducible all-in-one recipes for these applications with state-of-the-art performance in various benchmarks by incorporating transformer, advanced data augmentation, and conformer. This project aims to provide up-to-date speech processing experience to the community so that researchers in academia and various industry scales can develop their technologies collaboratively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2020

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

We present ESPnet-SE, which is designed for the quick development of spe...
research
04/21/2020

ESPnet-ST: All-in-One Speech Translation Toolkit

We present ESPnet-ST, which is designed for the quick development of spe...
research
10/26/2020

Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Spee...
research
10/11/2020

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) m...
research
10/05/2022

JoeyS2T: Minimalistic Speech-to-Text Modeling with JoeyNMT

JoeyS2T is a JoeyNMT extension for speech-to-text tasks such as automati...
research
06/03/2019

Fluent Translations from Disfluent Speech in End-to-End Speech Translation

Spoken language translation applications for speech suffer due to conver...
research
10/21/2020

BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks

We propose a method for joint multichannel speech dereverberation with t...

Please sign up or login with your details

Forgot password? Click here to reset