Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

04/18/2023
by   Xilai Li, et al.
0

Recently, there has been an increasing interest in unifying streaming and non-streaming speech recognition models to reduce development, training and deployment cost. The best-known approaches rely on either window-based or dynamic chunk-based attention strategy and causal convolutions to minimize the degradation due to streaming. However, the performance gap still remains relatively large between non-streaming and a full-contextual model trained independently. To address this, we propose a dynamic chunk-based convolution replacing the causal convolution in a hybrid Connectionist Temporal Classification (CTC)-Attention Conformer architecture. Additionally, we demonstrate further improvements through initialization of weights from a full-contextual model and parallelization of the convolution and self-attention modules. We evaluate our models on the open-source Voxpopuli, LibriSpeech and in-house conversational datasets. Overall, our proposed model reduces the degradation of the streaming mode over the non-streaming full-contextual model from 41.7 test-other datasets respectively, while improving by a relative 15.5 the previous state-of-the-art unified model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer

Conformer-based end-to-end models have become ubiquitous these days and ...
research
07/02/2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Attention-based end-to-end automatic speech recognition (ASR) systems ha...
research
12/10/2020

Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition

In this paper, we present a novel two-pass approach to unify streaming a...
research
10/07/2021

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

This paper improves the streaming transformer transducer for speech reco...
research
06/01/2023

Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning

The unified streaming and non-streaming speech recognition model has ach...
research
11/21/2022

Sequentially Sampled Chunk Conformer for Streaming End-to-End ASR

This paper presents an in-depth study on a Sequentially Sampled Chunk Co...
research
12/05/2017

Improving the Performance of Online Neural Transducer Models

Having a sequence-to-sequence model which can operate in an online fashi...

Please sign up or login with your details

Forgot password? Click here to reset