Conditional End-to-End Audio Transforms

03/30/2018
by   Albert Haque, et al.
0

We present an end-to-end method for transforming audio from one style to another. For the case of speech, by conditioning on speaker identities, we can train a single model to transform words spoken by multiple people into multiple target voices. For the case of music, we can specify musical instruments and achieve the same result. Architecturally, our method is a fully-differentiable sequence-to-sequence model based on convolutional and hierarchical recurrent neural networks. It is designed to capture long-term acoustic dependencies, requires minimal post-processing, and produces realistic audio transforms. Ablation studies confirm that our model can separate speaker and instrument properties from acoustic content at different receptive fields. Empirically, our method achieves competitive performance on community-standard datasets.

READ FULL TEXT

page 2

page 3

research
07/24/2021

Use of speaker recognition approaches for learning timbre representations of musical instrument sounds from raw waveforms

Timbre representations of musical instruments, essential for diverse app...
research
04/05/2021

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

In the English speech-to-text (STT) machine learning task, acoustic mode...
research
09/01/2021

You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection

Audio segmentation and sound event detection are crucial topics in machi...
research
08/20/2018

R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection

This paper proposes a Region-based Convolutional Recurrent Neural Networ...
research
02/13/2021

Deep Convolutional and Recurrent Networks for Polyphonic Instrument Classification from Monophonic Raw Audio Waveforms

Sound Event Detection and Audio Classification tasks are traditionally a...
research
12/22/2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

In this paper we propose a novel model for unconditional audio generatio...
research
11/10/2020

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

We explore pretraining strategies including choice of base corpus with t...

Please sign up or login with your details

Forgot password? Click here to reset