KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

10/18/2021
by   Xiaobin Zhuang, et al.
0

An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and resynthesizes the singing voice directly from the tuned pitch and vocal spectrum extracted from the original recordings. Several vital technical points have been introduced in KaraTuner to ensure pitch accuracy, pitch naturalness, timbre consistency, and sound quality. A feed-forward Transformer is employed in the pitch predictor to capture long-term dependencies in the vocal spectrum and musical note. We also develop a pitch-controllable vocoder base on a novel source-filter block and the Fre-GAN architecture. KaraTuner obtains a higher preference than the rule-based pitch correction approach through A/B tests, and perceptual experiments show that the proposed vocoder achieves significant advantages in timbre consistency and sound quality compared with the parametric WORLD vocoder and phase vocoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2022

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder

Our previous work, the unified source-filter GAN (uSFGAN) vocoder, intro...
research
09/21/2022

Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN

Singing voice synthesis (SVS) is the computer production of a human-like...
research
06/11/2020

XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System

This paper presents XiaoiceSing, a high-quality singing voice synthesis ...
research
02/03/2019

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

We describe a machine-learning approach to pitch correcting a solo singi...
research
06/10/2020

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Real-world audio recordings are often degraded by factors such as noise,...
research
05/12/2022

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

This paper introduces a unified source-filter network with a harmonic-pl...
research
02/16/2020

Two-dimensional Multi-fiber Spectrum Image Correction Based on Machine Learning Techniques

Due to limited size and imperfect of the optical components in a spectro...

Please sign up or login with your details

Forgot password? Click here to reset