A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

12/18/2019
by   Sri Harsha Dumpala, et al.
0

Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness of an ASR system. The CycleGAN model is trained on non-parallel examples of perturbed and normal speech. Experiments on spontaneous laughter-speech and creaky-speech datasets show that the performance of four different ASR systems improve by using speech obtained from CycleGAN based front-end, as compared to directly using the original perturbed speech. Visualization of the features of the laughter perturbed speech and those generated by the proposed front-end further demonstrates the effectiveness of our approach.

READ FULL TEXT

page 2

page 5

research
05/09/2019

Universal Adversarial Perturbations for Speech Recognition Systems

In this work, we demonstrate the existence of universal adversarial audi...
research
06/07/2021

Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios

Although end-to-end automatic speech recognition (E2E ASR) has achieved ...
research
05/13/2022

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Despite the rapid progress of automatic speech recognition (ASR) technol...
research
02/17/2022

Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition

In this work, we aim to enhance the system robustness of end-to-end auto...
research
01/27/2022

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition

Dysarthria is a motor speech disorder often characterized by reduced spe...
research
06/01/2023

SlothSpeech: Denial-of-service Attack Against Speech Recognition Models

Deep Learning (DL) models have been popular nowadays to execute differen...
research
04/12/2019

Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition

In general, the performance of automatic speech recognition (ASR) system...

Please sign up or login with your details

Forgot password? Click here to reset