Transferring neural speech waveform synthesizers to musical instrument sounds generation

10/27/2019
by   Yi Zhao, et al.
0

Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation. The similarity between speech and music audio synthesis techniques suggests interesting avenues to explore in terms of the best way to apply speech synthesizers in the music domain. This work compares three neural synthesizers used for musical instrument sounds generation under three scenarios: training from scratch on music data, zero-shot learning from the speech domain, and fine-tuning-based adaptation from the speech to the music domain. The results of a large-scale perceptual test demonstrated that the performance of three synthesizers improved when they were pre-trained on speech data and fine-tuned on music data, which indicates the usefulness of knowledge from speech data for music audio generation. Among the synthesizers, WaveGlow showed the best potential in zero-shot learning while NSF performed best in the other scenarios and could generate samples that were perceptually close to natural audio.

READ FULL TEXT
research
07/05/2019

Zero-shot Learning for Audio-based Music Classification and Tagging

Audio-based music classification and tagging is typically based on categ...
research
10/23/2018

SING: Symbol-to-Instrument Neural Generator

Recent progress in deep learning for audio synthesis opens the way to mo...
research
09/21/2022

An Initial study on Birdsong Re-synthesis Using Neural Vocoders

Modern speech synthesis uses neural vocoders to model raw waveform sampl...
research
04/09/2020

Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data

Models for music artist classification usually were operated in the freq...
research
08/26/2022

Mel Spectrogram Inversion with Stable Pitch

Vocoders are models capable of transforming a low-dimensional spectral r...
research
06/11/2021

Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Models for audio generation are typically trained on hours of recordings...
research
07/11/2021

Neural Waveshaping Synthesis

We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, ful...

Please sign up or login with your details

Forgot password? Click here to reset