ESPnet2-TTS: Extending the Edge of TTS Research

10/15/2021
by   Tomoki Hayashi, et al.
0

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance TTS performance. The unified design of our recipes enables users to quickly reproduce state-of-the-art E2E-TTS results. We also provide many pre-trained models in a unified Python interface for inference, offering a quick means for users to generate baseline samples and build demos. Experimental evaluations with English and Japanese corpora demonstrate that our provided models synthesize utterances comparable to ground-truth ones, achieving state-of-the-art TTS performance. The toolkit is available online at https://github.com/espnet/espnet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit ...
research
05/20/2022

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

PaddleSpeech is an open-source all-in-one speech toolkit. It aims at fac...
research
09/14/2023

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

This paper presents FunCodec, a fundamental neural speech codec toolkit,...
research
11/30/2022

EURO: ESPnet Unsupervised ASR Open-source Toolkit

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EU...
research
09/13/2023

Towards the TopMost: A Topic Modeling System Toolkit

Topic models have been proposed for decades with various applications an...
research
10/30/2022

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Keyword spotting (KWS) enables speech-based user interaction and gradual...
research
09/14/2021

fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit

This paper presents fairseq S^2, a fairseq extension for speech synthesi...

Please sign up or login with your details

Forgot password? Click here to reset