End-to-End Speech Recognition: A Survey

03/03/2023
by   Rohit Prabhavalkar, et al.
0

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50 the wake of this transition, a number of all-neural ASR architectures were introduced. These so-called end-to-end (E2E) models provide highly integrated, completely neural ASR models, which rely strongly on general machine learning knowledge, learn more consistently from data, while depending less on ASR domain-specific experience. The success and enthusiastic adoption of deep learning accompanied by more generic model architectures lead to E2E models now becoming the prominent ASR approach. The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements, and to discuss their properties and their relation to the classical hidden Markov model (HMM) based ASR architecture. All relevant aspects of E2E ASR are covered in this work: modeling, training, decoding, and external language model integration, accompanied by discussions of performance and deployment opportunities, as well as an outlook into potential future developments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2022

Korean Tokenization for Beam Search Rescoring in Speech Recognition

The performance of automatic speech recognition (ASR) models can be grea...
research
11/02/2018

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

In this paper we proposed a novel Adversarial Training (AT) approach for...
research
01/26/2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec

"Transcription bottlenecks", created by a shortage of effective human tr...
research
10/15/2020

Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

Many end-to-end Automatic Speech Recognition (ASR) systems still rely on...
research
04/27/2023

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Automatic speech recognition (ASR) has recently become an important chal...
research
11/03/2022

Probing Statistical Representations For End-To-End ASR

End-to-End automatic speech recognition (ASR) models aim to learn a gene...
research
09/21/2020

End-to-End Bengali Speech Recognition

Bengali is a prominent language of the Indian subcontinent. However, whi...

Please sign up or login with your details

Forgot password? Click here to reset