A review of on-device fully neural end-to-end automatic speech recognition algorithms

12/14/2020
by   Chanwoo Kim, et al.
0

In this paper, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on. To obtain sufficiently high speech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Connectionist Temporal Classification (CTC), Recurrent Neural Network Transducer (RNN-T), Attention-based Encoder-Decoder models (AED), Monotonic Chunk-wise Attention (MoChA), transformer-based speech recognition systems, and so on. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms, therefore their on-device implementation has become feasible. In this paper, we review such end-to-end speech recognition models. We extensively discuss their structures, performance, and advantages compared to conventional algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

This paper describes LeVoice automatic speech recognition systems to tra...
research
02/14/2021

Thank you for Attention: A survey on Attention-based Artificial Neural Networks for Automatic Speech Recognition

Attention is a very popular and effective mechanism in artificial neural...
research
03/27/2019

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

Connectionist Temporal Classification (CTC) based end-to-end speech reco...
research
02/25/2020

A.I. based Embedded Speech to Text Using Deepspeech

Deepspeech was very useful for development IoT devices that need voice r...
research
05/16/2023

Application-Agnostic Language Modeling for On-Device ASR

On-device automatic speech recognition systems face several challenges c...
research
11/06/2019

A comparison of end-to-end models for long-form speech recognition

End-to-end automatic speech recognition (ASR) models, including both att...
research
01/14/2021

Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications

Many real-world applications require to convert speech files into text w...

Please sign up or login with your details

Forgot password? Click here to reset