Blank Collapse: Compressing CTC emission for the faster decoding

10/31/2022
by   Minkyu Jung, et al.
0

Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a very simple method to reduce the amount of calculation resulting in faster beam search decoding speed. With this method, we can get up to 78 faster decoding speed than ordinary beam search decoding with a very small loss of accuracy in LibriSpeech datasets. We prove this method is effective not only practically by experiments but also theoretically by mathematical reasoning. We also observe that this reduction is more obvious if the accuracy of the model is higher.

READ FULL TEXT
research
02/22/2022

Korean Tokenization for Beam Search Rescoring in Speech Recognition

The performance of automatic speech recognition (ASR) models can be grea...
research
05/08/2019

A Hardware-Oriented and Memory-Efficient Method for CTC Decoding

The Connectionist Temporal Classification (CTC) has achieved great succe...
research
10/07/2021

Back from the future: bidirectional CTC decoding using future information in speech recognition

In this paper, we propose a simple but effective method to decode the ou...
research
07/20/2023

Transsion TSUP's speech recognition system for ASRU 2023 MADASR Challenge

This paper presents a speech recognition system developed by the Transsi...
research
03/21/2022

Enhancing Speech Recognition Decoding via Layer Aggregation

Recently proposed speech recognition systems are designed to predict usi...
research
04/30/2018

Accelerating NMT Batched Beam Decoding with LMBR Posteriors for Deployment

We describe a batched beam decoding algorithm for NMT with LMBR n-gram p...
research
10/06/2021

Spell my name: keyword boosted speech recognition

Recognition of uncommon words such as names and technical terminology is...

Please sign up or login with your details

Forgot password? Click here to reset