MelNet: A Generative Model for Audio in the Frequency Domain

06/04/2019
by   Sean Vasquez, et al.
0

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve. We apply our model to a variety of audio generation tasks, including unconditional speech generation, music generation, and text-to-speech synthesis---showing improvements over previous approaches in both density estimates and human judgments.

READ FULL TEXT

page 2

page 5

page 6

research
06/26/2018

The challenge of realistic music generation: modelling raw audio at scale

Realistic music generation is a challenging task. When building generati...
research
07/21/2021

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Deep generative models have recently achieved impressive performance in ...
research
12/22/2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

In this paper we propose a novel model for unconditional audio generatio...
research
08/16/2020

Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

In recent works, a flow-based neural vocoder has shown significant impro...
research
09/12/2016

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw ...
research
01/12/2021

MP3net: coherent, minute-long music generation from raw audio with a simple convolutional GAN

We present a deep convolutional GAN which leverages techniques from MP3/...
research
12/08/2022

High Quality Audio Coding with MDCTNet

We propose a neural audio generative model, MDCTNet, operating in the pe...

Please sign up or login with your details

Forgot password? Click here to reset