Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

01/20/2023
by   Yinghao Aaron Li, et al.
0

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns. However, these models are usually word-level or sup-phoneme-level and jointly trained with phonemes, making them inefficient for the downstream TTS task where only phonemes are needed. In this work, we propose a phoneme-level BERT (PL-BERT) with a pretext task of predicting the corresponding graphemes along with the regular masked phoneme predictions. Subjective evaluations show that our phoneme-level BERT encoder has significantly improved the mean opinion scores (MOS) of rated naturalness of synthesized speech compared with the state-of-the-art (SOTA) StyleTTS baseline on out-of-distribution (OOD) texts.

READ FULL TEXT
research
03/28/2021

PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS

This paper introduces PnG BERT, a new encoder model for neural TTS. This...
research
06/17/2019

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

Modern text-to-speech (TTS) systems are able to generate audio that soun...
research
05/13/2021

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level

Larger language models have higher accuracy on average, but are they bet...
research
01/10/2021

BERT Family Eat Word Salad: Experiments with Text Understanding

In this paper, we study the response of large models from the BERT famil...
research
09/12/2023

Measuring vagueness and subjectivity in texts: from symbolic to neural VAGO

We present a hybrid approach to the automated measurement of vagueness a...
research
08/13/2022

Interpreting BERT-based Text Similarity via Activation and Saliency Maps

Recently, there has been growing interest in the ability of Transformer-...
research
08/03/2023

Improving Requirements Completeness: Automated Assistance through Large Language Models

Natural language (NL) is arguably the most prevalent medium for expressi...

Please sign up or login with your details

Forgot password? Click here to reset