Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks

05/09/2023
by   Deniss Strods, et al.
0

Gaps, dropouts and short clips of corrupted audio are a common problem and particularly annoying when they occur in speech. This paper uses machine learning to regenerate gaps of up to 320ms in an audio speech signal. Audio regeneration is translated into image regeneration by transforming audio into a Mel-spectrogram and using image in-painting to regenerate the gaps. The full Mel-spectrogram is then transferred back to audio using the Parallel-WaveGAN vocoder and integrated into the audio stream. Using a sample of 1300 spoken audio clips of between 1 and 10 seconds taken from the publicly-available LJSpeech dataset our results show regeneration of audio gaps in close to real time using GANs with a GPU equipped system. As expected, the smaller the gap in the audio, the better the quality of the filled gaps. On a gap of 240ms the average mean opinion score (MOS) for the best performing models was 3.737, on a scale of 1 (worst) to 5 (best) which is sufficient for a human to perceive as close to uninterrupted human speech.

READ FULL TEXT

page 3

page 5

page 6

research
02/12/2018

Synthesizing Audio with Generative Adversarial Networks

While Generative Adversarial Networks (GANs) have seen wide success at t...
research
11/23/2022

IMaSC – ICFOSS Malayalam Speech Corpus

Modern text-to-speech (TTS) systems use deep learning to synthesize spee...
research
08/01/2023

Artifact: Measuring and Mitigating Gaps in Structural Testing

The artifact used for evaluating the experimental results of Measuring a...
research
12/30/2022

Blind Restoration of Real-World Audio by 1D Operational GANs

Objective: Despite numerous studies proposed for audio restoration in th...
research
03/27/2020

Mic2Mic: Using Cycle-Consistent Generative Adversarial Networks to Overcome Microphone Variability in Speech Systems

Mobile and embedded devices are increasingly using microphones and audio...
research
01/21/2023

New Challenges for Content Privacy in Speech and Audio

Privacy in speech and audio has many facets. A particularly under-develo...
research
10/31/2018

WaveGlow: A Flow-based Generative Network for Speech Synthesis

In this paper we propose WaveGlow: a flow-based network capable of gener...

Please sign up or login with your details

Forgot password? Click here to reset