Mel Spectrogram Inversion with Stable Pitch

08/26/2022
by   Bruno Di Giorgi, et al.
0

Vocoders are models capable of transforming a low-dimensional spectral representation of an audio signal, typically the mel spectrogram, to a waveform. Modern speech generation pipelines use a vocoder as their final component. Recent vocoder models developed for speech achieve a high degree of realism, such that it is natural to wonder how they would perform on music signals. Compared to speech, the heterogeneity and structure of the musical sound texture offers new challenges. In this work we focus on one specific artifact that some vocoder models designed for speech tend to exhibit when applied to music: the perceived instability of pitch when synthesizing sustained notes. We argue that the characteristic sound of this artifact is due to the lack of horizontal phase coherence, which is often the result of using a time-domain target space with a model that is invariant to time-shifts, such as a convolutional neural network. We propose a new vocoder model that is specifically designed for music. Key to improving the pitch stability is the choice of a shift-invariant target space that consists of the magnitude spectrum and the phase gradient. We discuss the reasons that inspired us to re-formulate the vocoder task, outline a working example, and evaluate it on musical signals. Our method results in 60 sustained notes and chords with respect to existing models, using a novel harmonic error metric.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2018

PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

Music creation is typically composed of two parts: composing the musical...
research
10/27/2019

Transferring neural speech waveform synthesizers to musical instrument sounds generation

Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the n...
research
10/31/2017

Melody Generation for Pop Music via Word Representation of Musical Properties

Automatic melody generation for pop music has been a long-time aspiratio...
research
08/02/2021

Musical Speech: A Transformer-based Composition Tool

In this paper, we propose a new compositional tool that will generate a ...
research
09/30/2016

Optimal spectral transportation with application to music transcription

Many spectral unmixing methods rely on the non-negative decomposition of...
research
07/23/2021

Multi-Channel Automatic Music Transcription Using Tensor Algebra

Music is an art, perceived in unique ways by every listener, coming from...
research
08/16/2018

Genre-Agnostic Key Classification With Convolutional Neural Networks

We propose modifications to the model structure and training procedure t...

Please sign up or login with your details

Forgot password? Click here to reset