Recent Developments on ESPnet Toolkit Boosted by Conformer

10/26/2020
by   Pengcheng Guo, et al.
0

In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2020

ESPnet-ST: All-in-One Speech Translation Toolkit

We present ESPnet-ST, which is designed for the quick development of spe...
research
01/14/2022

A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies

In this study, we present recent developments of models trained with the...
research
09/13/2019

A Comparative Study on Transformer vs RNN in Speech Applications

Sequence-to-sequence models have been widely used in end-to-end speech p...
research
05/18/2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

Conformer, a convolution-augmented Transformer variant, has become the d...
research
05/18/2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

This paper introduces FunASR, an open-source speech recognition toolkit ...
research
12/23/2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans

This paper describes the recent development of ESPnet (https://github.co...
research
02/14/2023

TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments

The evidence is growing that machine and deep learning methods can learn...

Please sign up or login with your details

Forgot password? Click here to reset