Puffin: pitch-synchronous neural waveform generation for fullband speech on modest devices

11/25/2022
by   Oliver Watts, et al.
0

We present a neural vocoder designed with low-powered Alternative and Augmentative Communication devices in mind. By combining elements of successful modern vocoders with established ideas from an older generation of technology, our system is able to produce high quality synthetic speech at 48kHz on devices where neural vocoders are otherwise prohibitively complex. The system is trained adversarially using differentiable pitch synchronous overlap add, and reduces complexity by relying on pitch synchronous Inverse Short-Time Fourier Transform (ISTFT) to generate speech samples. Our system achieves comparable quality with a strong (HiFi-GAN) baseline while using only a fraction of the compute. We present results of a perceptual evaluation as well as an analysis of system complexity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2018

Epoch-Synchronous Overlap-Add (ESOLA) for Time- and Pitch-Scale Modification of Speech Signals

Time- and pitch-scale modifications of speech signals find important app...
research
11/13/2022

Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing

Most state-of-the-art Text-to-Speech systems use the mel-spectrogram as ...
research
08/07/2019

Pitch-Synchronous Single Frequency Filtering Spectrogram for Speech Emotion Recognition

Convolutional neural networks (CNN) are widely used for speech emotion r...
research
09/18/2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Recent advancements in speech synthesis have leveraged GAN-based network...
research
02/16/2023

QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion

With the development of automatic speech recognition (ASR) and text-to-s...
research
12/08/2022

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

GAN vocoders are currently one of the state-of-the-art methods for build...

Please sign up or login with your details

Forgot password? Click here to reset