ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

10/24/2019
by   Tomoki Hayashi, et al.
0

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit supports state-of-the-art E2E-TTS models, including Tacotron 2, Transformer TTS, and FastSpeech, and also provides recipes inspired by the Kaldi automatic speech recognition (ASR) toolkit. The recipes are based on the design unified with the ESPnet ASR recipe, providing high reproducibility. The toolkit also provides pre-trained models and samples of all of the recipes so that users can use it as a baseline. Furthermore, the unified design enables the integration of ASR functions with TTS, e.g., ASR-based objective evaluation and semi-supervised learning with both ASR and TTS models. This paper describes the design of the toolkit and experimental evaluation in comparison with other toolkits. The experimental results show that our best model outperforms other toolkits, resulting in a mean opinion score (MOS) of 4.25 on the LJSpeech dataset. The toolkit is available on GitHub.

READ FULL TEXT
research
03/30/2018

ESPnet: End-to-End Speech Processing Toolkit

This paper introduces a new open source platform for end-to-end speech p...
research
11/30/2022

EURO: ESPnet Unsupervised ASR Open-source Toolkit

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EU...
research
09/14/2023

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

This paper presents FunCodec, a fundamental neural speech codec toolkit,...
research
10/15/2021

ESPnet2-TTS: Extending the Edge of TTS Research

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS)...
research
01/08/2023

SpeeChain: A Speech Toolkit for Large-Scale Machine Speech Chain

This paper introduces SpeeChain, an open-source Pytorch-based toolkit de...
research
01/14/2020

Improved Robust ASR for Social Robots in Public Spaces

Social robots deployed in public spaces present a challenging task for A...
research
08/09/2022

LAMDA-SSL: Semi-Supervised Learning in Python

LAMDA-SSL is open-sourced on GitHub and its detailed usage documentation...

Please sign up or login with your details

Forgot password? Click here to reset