DeepAI AI Chat
Log In Sign Up

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

11/01/2022
by   Xingchen Song, et al.
Horizon Robotics
0

In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be applied online and optimized with any training loss or any model architecture on any dataset without any extra effort by applying it on various end-to-end streaming ASR networks either trained with CTC loss [1] or Transducer loss [2]. We achieve 100 ∼ 200ms latency reduction with equal or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive Delay (USD) with an accuracy loss of less than 0.2.

READ FULL TEXT
10/21/2020

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

Streaming automatic speech recognition (ASR) aims to emit each hypothesi...
10/20/2021

An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

In the present paper, an attempt is made to combine Mask-CTC and the tri...
05/06/2021

Reducing Streaming ASR Model Delay with Self Alignment

Reducing prediction delay for streaming end-to-end ASR models with minim...
05/13/2022

Detecting Rumours with Latency Guarantees using Massive Streaming Data

Today's social networks continuously generate massive streams of data, w...
03/31/2022

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

History and future contextual information are known to be important for ...
04/25/2021

Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models

Streaming end-to-end automatic speech recognition (ASR) systems are wide...
11/02/2022

Conversation-oriented ASR with multi-look-ahead CBS architecture

During conversations, humans are capable of inferring the intention of t...