SAN: a robust end-to-end ASR model architecture

10/27/2022
by   Zeping Min, et al.
0

In this paper, we propose a novel Siamese Adversarial Network (SAN) architecture for automatic speech recognition, which aims at solving the difficulty of fuzzy audio recognition. Specifically, SAN constructs two sub-networks to differentiate the audio feature input and then introduces a loss to unify the output distribution of these sub-networks. Adversarial learning enables the network to capture more essential acoustic features and helps the models achieve better performance when encountering fuzzy audio input. We conduct numerical experiments with the SAN model on several datasets for the automatic speech recognition task. All experimental results show that the siamese adversarial nets significantly reduce the character error rate (CER). Specifically, we achieve a new state of art 4.37 CER without language model on the AISHELL-1 dataset, which leads to around 5 reduction. To reveal the generality of the siamese adversarial net, we also conduct experiments on the phoneme recognition task, which also shows the superiority of the siamese adversarial network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/07/2020

Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition

A new label smoothing method that makes use of prior knowledge of a lang...
research
08/02/2021

Decoupling recognition and transcription in Mandarin ASR

Much of the recent literature on automatic speech recognition (ASR) is t...
research
06/17/2018

Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin

End-to-end models have been showing superiority in Automatic Speech Reco...
research
02/01/2022

Visualizing Automatic Speech Recognition – Means for a Better Understanding?

Automatic speech recognition (ASR) is improving ever more at mimicking h...
research
03/12/2018

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

Dialect identification (DID) is a special case of general language ident...
research
05/08/2021

Robustness of end-to-end Automatic Speech Recognition Models – A Case Study using Mozilla DeepSpeech

When evaluating the performance of automatic speech recognition models, ...
research
04/25/2022

Understanding Audio Features via Trainable Basis Functions

In this paper we explore the possibility of maximizing the information r...

Please sign up or login with your details

Forgot password? Click here to reset