Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

02/09/2019
by   Hiroki Tamaru, et al.
0

This paper proposes a generative moment matching network (GMMN)-based post-filter that provides inter-utterance pitch variation for deep neural network (DNN)-based singing voice synthesis. The natural pitch variation of a human singing voice leads to a richer musical experience and is used in double-tracking, a recording method in which two performances of the same phrase are recorded and mixed to create a richer, layered sound. However, singing voices synthesized using conventional DNN-based methods never vary because the synthesis process is deterministic and only one waveform is synthesized from one musical score. To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis. Experimental evaluations suggest that 1) our approach can provide perceptible inter-utterance pitch variation while preserving speech quality. We extend our approach to double-tracking, and the evaluation demonstrates that 2) GMMN-based neural double-tracking is perceptually closer to natural double-tracking than conventional signal processing-based artificial double-tracking is.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2017

Sampling-based speech parameter generation using moment-matching networks

This paper presents sampling-based speech parameter generation using mom...
research
08/05/2021

Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System

This paper presents Sinsy, a deep neural network (DNN)-based singing voi...
research
04/07/2022

Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder

This paper proposes a controllable singing voice synthesis system capabl...
research
02/18/2015

F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief Network

In recent years multilayer perceptrons (MLPs) with many hid- den layers ...
research
07/12/2022

NEC: Speaker Selective Cancellation via Neural Enhanced Ultrasound Shadowing

In this paper, we propose NEC (Neural Enhanced Cancellation), a defense ...
research
10/24/2019

Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks

The present paper describes singing voice synthesis based on convolution...

Please sign up or login with your details

Forgot password? Click here to reset