Audio Adversarial Examples for Robust Hybrid CTC/Attention Speech Recognition

07/21/2020
by   Ludwig Kürzinger, et al.
0

Recent advances in Automatic Speech Recognition (ASR) demonstrated how end-to-end systems are able to achieve state-of-the-art performance. There is a trend towards deeper neural networks, however those ASR models are also more complex and prone against specially crafted noisy data. Those Audio Adversarial Examples (AAE) were previously demonstrated on ASR systems that use Connectionist Temporal Classification (CTC), as well as attention-based encoder-decoder architectures. Following the idea of the hybrid CTC/attention ASR system, this work proposes algorithms to generate AAEs to combine both approaches into a joint CTC-attention gradient method. Evaluation is performed using a hybrid CTC/attention end-to-end ASR model on two reference sentences as case study, as well as the TEDlium v2 speech recognition task. We then demonstrate the application of this algorithm for adversarial training to obtain a more robust ASR model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2017

Advances in Joint CTC-Attention based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM

We present a state-of-the-art end-to-end Automatic Speech Recognition (A...
research
11/02/2021

Recent Advances in End-to-End Automatic Speech Recognition

Recently, the speech community is seeing a significant trend of moving f...
research
02/01/2022

Visualizing Automatic Speech Recognition – Means for a Better Understanding?

Automatic speech recognition (ASR) is improving ever more at mimicking h...
research
07/25/2020

MP3 Compression To Diminish Adversarial Noise in End-to-End Speech Recognition

Audio Adversarial Examples (AAE) represent specially created inputs mean...
research
03/31/2020

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Recent studies have highlighted adversarial examples as ubiquitous threa...
research
09/18/2023

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

This paper presents an overview and evaluation of some of the end-to-end...
research
07/24/2023

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training

Developing a practically-robust automatic speech recognition (ASR) is ch...

Please sign up or login with your details

Forgot password? Click here to reset