Vocal Tract Area Estimation by Gradient Descent

07/10/2023
by   David Südholt, et al.
0

Articulatory features can provide interpretable and flexible controls for the synthesis of human vocalizations by allowing the user to directly modify parameters like vocal strain or lip position. To make this manipulation through resynthesis possible, we need to estimate the features that result in a desired vocalization directly from audio recordings. In this work, we propose a white-box optimization technique for estimating glottal source parameters and vocal tract shapes from audio recordings of human vowels. The approach is based on inverse filtering and optimizing the frequency response of a wave­guide model of the vocal tract with gradient descent, propagating error gradients through the mapping of articulatory features to the vocal tract area function. We apply this method to the task of matching the sound of the Pink Trombone, an interactive articulatory synthesizer, to a given vocalization. We find that our method accurately recovers control functions for audio generated by the Pink Trombone itself. We then compare our technique against evolutionary optimization algorithms and a neural network trained to predict control parameters from audio. A subjective evaluation finds that our approach outperforms these black-box optimization baselines on the task of reproducing human vocalizations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2021

Differentiable Signal Processing With Black-Box Audio Effects

We present a data-driven approach to automate audio signal processing by...
research
11/22/2019

Low-variance Black-box Gradient Estimates for the Plackett-Luce Distribution

Learning models with discrete latent variables using stochastic gradient...
research
02/05/2021

White-box Audio VST Effect Programming

Learning to program an audio production VST plugin is a time consuming p...
research
11/11/2016

Learning to Learn without Gradient Descent by Gradient Descent

We learn recurrent neural network optimizers trained on simple synthetic...
research
12/17/2021

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Musical expression requires control of both what notes are played, and h...
research
09/14/2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input

We explore the use of neural synthesis for acoustic guitar from string-w...
research
02/25/2020

Optimizing User Interface Layouts via Gradient Descent

Automating parts of the user interface (UI) design process has been a lo...

Please sign up or login with your details

Forgot password? Click here to reset