CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

03/31/2022
by   Keyu An, et al.
0

History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, without waiting for future context. The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e.g., CTC-CRF as used in our experiments. Experiments show that, compared to using real future frames as right context, using simulated future context can drastically reduce latency while maintaining recognition accuracy. With CUSIDE, we obtain new state-of-the-art streaming ASR results on the AISHELL-1 dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Universal ASR: Unify and Improve Streaming ASR with Full-context Modeling

Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
research
06/17/2021

Multi-mode Transformer Transducer with Stochastic Future Context

Automatic speech recognition (ASR) models make fewer errors when more su...
research
11/26/2018

Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context

The use of future contextual information is typically shown to be helpfu...
research
07/25/2022

Learning a Dual-Mode Speech Recognition Model via Self-Pruning

There is growing interest in unifying the streaming and full-context aut...
research
10/20/2021

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

In the present paper, an attempt is made to combine Mask-CTC and the tri...
research
06/13/2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer

Conformer-based end-to-end models have become ubiquitous these days and ...
research
11/01/2022

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

In this paper, we present TrimTail, a simple but effective emission regu...

Please sign up or login with your details

Forgot password? Click here to reset