Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis

04/25/2021
by   Erica Cooper, et al.
0

Speech synthesis and music audio generation from symbolic input differ in many aspects but share some similarities. In this study, we investigate how text-to-speech synthesis techniques can be used for piano MIDI-to-audio synthesis tasks. Our investigation includes Tacotron and neural source-filter waveform models as the basic components, with which we build MIDI-to-audio synthesis systems in similar ways to TTS frameworks. We also include reference systems using conventional sound modeling techniques such as sample-based and physical-modeling-based methods. The subjective experimental results demonstrate that the investigated TTS components can be applied to piano MIDI-to-audio synthesis with minor modifications. The results also reveal the performance bottleneck – while the waveform model can synthesize high quality piano sound given natural acoustic features, the conversion from MIDI to acoustic features is challenging. The full MIDI-to-audio synthesis system is still inferior to the sample-based or physical-modeling-based approaches, but we encourage TTS researchers to test their TTS models for this new task and improve the performance.

READ FULL TEXT
research
11/25/2022

Can Knowledge of End-to-End Text-to-Speech Models Improve Neural MIDI-to-Audio Synthesis Systems?

With the similarity between music and speech synthesis from symbolic inp...
research
09/14/2023

DDSP-based Neural Waveform Synthesis of Polyphonic Guitar Performance from String-wise MIDI Input

We explore the use of neural synthesis for acoustic guitar from string-w...
research
09/19/2019

WEnets: A Convolutional Framework for Evaluating Audio Waveforms

We describe a new convolutional framework for waveform evaluation, WEnet...
research
09/21/2022

An Initial study on Birdsong Re-synthesis Using Neural Vocoders

Modern speech synthesis uses neural vocoders to model raw waveform sampl...
research
06/30/2022

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

This paper introduces R-MelNet, a two-part autoregressive architecture w...
research
08/04/2020

Neural Granular Sound Synthesis

Granular sound synthesis is a popular audio generation technique based o...
research
01/07/2022

Audio representations for deep learning in sound synthesis: A review

The rise of deep learning algorithms has led many researchers to withdra...

Please sign up or login with your details

Forgot password? Click here to reset