DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer

05/31/2023
by   Yerin Choi, et al.
0

Despite the huge successes made in neutral TTS, content-leakage remains a challenge. In this paper, we propose a new input representation and simple architecture to achieve improved prosody modeling. Inspired by the recent success in the use of discrete code in TTS, we introduce discrete code to the input of the reference encoder. Specifically, we leverage the vector quantizer from the audio compression model to exploit the diverse acoustic information it has already been trained on. In addition, we apply the modified MLP-Mixer to the reference encoder, making the architecture lighter. As a result, we train the prosody transfer TTS in an end-to-end manner. We prove the effectiveness of our method through both subjective and objective evaluations. We demonstrate that the reference encoder learns better speaker-independent prosody when discrete code is utilized as input in the experiments. In addition, we obtain comparable results even when fewer parameters are inputted.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2018

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

We present an extension to the Tacotron speech synthesis architecture th...
research
05/12/2020

AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN

This paper investigates how to leverage a DurIAN-based average model to ...
research
11/04/2022

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

We propose an end-to-end music mixing style transfer system that convert...
research
05/28/2019

Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion

We present an unsupervised end-to-end training scheme where we discover ...
research
05/25/2021

Deep Neural Networks and End-to-End Learning for Audio Compression

Recent achievements in end-to-end deep learning have encouraged the expl...
research
12/08/2022

High Quality Audio Coding with MDCTNet

We propose a neural audio generative model, MDCTNet, operating in the pe...

Please sign up or login with your details

Forgot password? Click here to reset