Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

08/31/2023
by   Shaohuan Zhou, et al.
0

This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, different from the previous SVS models, we use text representation of lyrics extracted from pre-trained BERT as additional input to the model. The representation contains information about semantics of the lyrics, which could help SVS system produce more expressive and natural voice. Second, we further introduce an energy predictor to stabilize the synthesized voice and model the wider range of energy variations that also contribute to the expressiveness of singing voice. Last but not the least, to attenuate the off-key issues, the pitch predictor is re-designed to predict the real to note pitch ratio. Both objective and subjective experimental results indicate that the proposed SVS system can produce singing voice with higher-quality outperforming VISinger.

READ FULL TEXT
research
11/02/2022

Singing Voice Synthesis with Vibrato Modeling and Latent Energy Representation

This paper proposes an expressive singing voice synthesis system by intr...
research
11/06/2020

Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis

Despite prosody is related to the linguistic information up to the disco...
research
08/06/2021

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

With the rapid development of neural network architectures and speech pr...
research
09/01/2023

Enhancing the vocal range of single-speaker singing voice synthesis with melody-unsupervised pre-training

The single-speaker singing voice synthesis (SVS) usually underperforms a...
research
10/11/2021

Pitch Preservation In Singing Voice Synthesis

Suffering from limited singing voice corpus, existing singing voice synt...
research
04/23/2020

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) sy...
research
09/06/2022

The Role of Voice Persona in Expressive Communication:An Argument for Relevance in Speech Synthesis Design

We present an approach to imbuing expressivity in a synthesized voice by...

Please sign up or login with your details

Forgot password? Click here to reset