Grapheme or phoneme? An Analysis of Tacotron's Embedded Representations

10/21/2020
by   Antoine Perquin, et al.
4

End-to-end models, particularly Tacotron-based ones, are currently a popular solution for text-to-speech synthesis. They allow the production of high-quality synthesized speech with little to no text preprocessing. Phoneme inputs are usually preferred over graphemes in order to limit the amount of pronunciation errors. In this work we show that, in the case of a well-curated French dataset, graphemes can be used as input without increasing the amount of pronunciation errors. Furthermore, we perform an analysis of the representation learned by the Tacotron model and show that the contextual grapheme embeddings encode phoneme information, and that they can be used for grapheme-to-phoneme conversion and phoneme control of synthetic speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

RUSLAN: Russian Spoken Language Corpus for Speech Synthesis

We present RUSLAN -- a new open Russian spoken language corpus for the t...
research
11/06/2018

Robust and fine-grained prosody control of end-to-end speech synthesis

We propose prosody embeddings for emotional and expressive speech synthe...
research
06/20/2022

An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

End-to-end learning models have demonstrated a remarkable capability in ...
research
07/05/2021

Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input

Articulatory information has been shown to be effective in improving the...
research
09/13/2022

Deep Speech Synthesis from Articulatory Representations

In the articulatory synthesis task, speech is synthesized from input fea...
research
01/25/2021

High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion

This Ph.D. thesis focuses on developing a system for high-quality speech...
research
03/31/2022

WavThruVec: Latent speech representation as intermediate features for neural speech synthesis

Recent advances in neural text-to-speech research have been dominated by...

Please sign up or login with your details

Forgot password? Click here to reset