An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

10/20/2021
by   Huaibo Zhao, et al.
0

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term contexts), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

READ FULL TEXT
research
09/09/2023

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

Achieving high accuracy with low latency has always been a challenge in ...
research
07/20/2021

Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models

Non-autoregressive (NAR) modeling has gained more and more attention in ...
research
03/11/2022

Transformer-based Streaming ASR with Cumulative Attention

In this paper, we propose an online attention mechanism, known as cumula...
research
11/02/2022

Conversation-oriented ASR with multi-look-ahead CBS architecture

During conversations, humans are capable of inferring the intention of t...
research
09/19/2023

Semi-Autoregressive Streaming ASR With Label Context

Non-autoregressive (NAR) modeling has gained significant interest in spe...
research
11/01/2022

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

In this paper, we present TrimTail, a simple but effective emission regu...
research
03/31/2022

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

History and future contextual information are known to be important for ...

Please sign up or login with your details

Forgot password? Click here to reset