Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

06/11/2021
by   Gal Greshler, et al.
0

Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. speech, music, etc.) and does not require pre-training or any other form of external supervision. Once trained, our model can generate random samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives. This enables a long line of interesting applications, including generating new jazz improvisations or new a-cappella rap variants based on a single short example, producing coherent modifications to famous songs (e.g. adding a new verse to a Beatles song based solely on the original recording), filling-in of missing parts (inpainting), extending the bandwidth of a speech signal (super-resolution), and enhancing old recordings without access to any clean training example. We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results. This is despite its complete lack of prior knowledge about the nature of audio signals in general.

READ FULL TEXT

page 2

page 3

page 4

page 7

page 8

page 9

page 15

page 16

research
08/02/2017

Audio Super Resolution using Neural Networks

We introduce a new audio processing technique that increases the samplin...
research
06/16/2021

WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution

Audio super-resolution is the task of constructing a high-resolution (HR...
research
10/27/2019

Transferring neural speech waveform synthesizers to musical instrument sounds generation

Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the n...
research
07/18/2022

Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks

This paper presents a simple method for speech videos generation based o...
research
07/21/2022

Deep Audio Waveform Prior

Convolutional neural networks contain strong priors for generating natur...
research
07/11/2020

Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network

In this paper, a pitch-adaptive waveform generative model named Quasi-Pe...
research
06/13/2020

GIPFA: Generating IPA Pronunciation from Audio

Transcribing spoken audio samples into International Phonetic Alphabet (...

Please sign up or login with your details

Forgot password? Click here to reset