Vocal Timbre Effects with Differentiable Digital Signal Processing

06/19/2023
by   David Südholt, et al.
1

We explore two approaches to creatively altering vocal timbre using Differentiable Digital Signal Processing (DDSP). The first approach is inspired by classic cross-synthesis techniques. A pretrained DDSP decoder predicts a filter for a noise source and a harmonic distribution, based on pitch and loudness information extracted from the vocal input. Before synthesis, the harmonic distribution is modified by interpolating between the predicted distribution and the harmonics of the input. We provide a real-time implementation of this approach in the form of a Neutone model. In the second approach, autoencoder models are trained on datasets consisting of both vocal and instrument training data. To apply the effect, the trained autoencoder attempts to reconstruct the vocal input. We find that there is a desirable "sweet spot" during training, where the model has learned to reconstruct the phonetic content of the input vocals, but is still affected by the timbre of the instrument mixed into the training data. After further training, that effect disappears. A perceptual evaluation compares the two approaches. We find that the autoencoder in the second approach is able to reconstruct intelligible lyrical content without any explicit phonetic information provided during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2022

Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds

A differentiable digital signal processing (DDSP) autoencoder is a music...
research
09/13/2023

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Differentiable digital signal processing (DDSP) techniques, including me...
research
08/09/2022

DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation

A vocoder is a conditional audio generation model that converts acoustic...
research
06/29/2023

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for...
research
11/05/2022

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer

End-to-end singing voice synthesis (SVS) model VISinger can achieve bett...
research
01/14/2020

DDSP: Differentiable Digital Signal Processing

Most generative models of audio directly generate samples in one of two ...
research
04/20/2018

Practical Issues in the Synthesis of Ternary Sequences

Several issues related to the practical synthesis of ternary sequences w...

Please sign up or login with your details

Forgot password? Click here to reset