ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations

03/01/2023
by   Saiteja Kosgi, et al.
0

Text-to-speech (TTS) systems are modelled as mel-synthesizers followed by speech-vocoders since the era of statistical TTS that is carried forward into neural designs. We propose an alternative approach to TTS modelling referred to as ParrotTTS borrowing from self-supervised learning (SSL) methods. ParrotTTS takes a two-step approach by initially training a speech-to-speech model on unlabelled data that is abundantly available, followed by a text-to-embedding model that leverages speech with aligned transcriptions to extend it to TTS. ParrotTTS achieves competitive mean opinion scores on naturalness compared to traditional TTS models but significantly improves over the latter's data efficiency of transcribed pairs and speaker adaptation without transcriptions. This further paves the path to training TTS models on generically trained SSL speech models.

READ FULL TEXT

page 4

page 8

research
05/19/2023

MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

We present MParrotTTS, a unified multilingual, multi-speaker text-to-spe...
research
07/11/2023

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Self-supervised learning (SSL) speech representations learned from large...
research
04/11/2022

Unified Speech-Text Pre-training for Speech Translation and Recognition

We describe a method to jointly pre-train speech and text in an encoder-...
research
10/15/2022

Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

Self-supervised learning of speech representations from large amounts of...
research
01/17/2023

MooseNet: A trainable metric for synthesized speech with plda backend

We present MooseNet, a trainable speech metric that predicts listeners' ...
research
06/20/2020

Embodied Self-supervised Learning by Coordinated Sampling and Training

Self-supervised learning can significantly improve the performance of do...
research
01/22/2016

Speech vocoding for laboratory phonology

Using phonological speech vocoding, we propose a platform for exploring ...

Please sign up or login with your details

Forgot password? Click here to reset