Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

10/19/2021
by   Mutian He, et al.
0

End-to-end TTS suffers from high data requirements as it is difficult for both costly speech corpora to cover all necessary knowledge and neural models to learn the knowledge, hence additional knowledge needs to be injected manually. For example, to capture pronunciation knowledge on languages without regular orthography, a complicated grapheme-to-phoneme pipeline needs to be built based on a structured, large pronunciation lexicon, leading to extra, sometimes high, costs to extend neural TTS to such languages. In this paper, we propose a framework to learn to extract knowledge from unstructured external resources using Token2Knowledge attention modules. The framework is applied to build a novel end-to-end TTS model named Neural Lexicon Reader that extracts pronunciations from raw lexicon texts. Experiments support the potential of our framework that the model significantly reduces pronunciation errors in low-resource, end-to-end Chinese TTS, and the lexicon-reading capability can be transferred to other languages with a smaller amount of data.

READ FULL TEXT
research
12/07/2022

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

End-to-end text-to-speech (TTS) systems have been developed for European...
research
04/13/2019

End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning

End-to-end text-to-speech (TTS) has shown great success on large quantit...
research
10/29/2018

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

End-to-end speech synthesis is a promising approach that directly conver...
research
06/22/2019

End-to-End ASR for Code-switched Hindi-English Speech

End-to-end (E2E) models have been explored for large speech corpora and ...
research
03/05/2021

Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken Language Learners

We present a multilingual end-to-end Text-To-Speech framework that maps ...
research
10/15/2021

Integrating diverse extraction pathways using iterative predictions for Multilingual Open Information Extraction

In this paper we investigate a simple hypothesis for the Open Informatio...
research
11/15/2017

Emotional End-to-End Neural Speech Synthesizer

In this paper, we introduce an emotional speech synthesizer based on the...

Please sign up or login with your details

Forgot password? Click here to reset