Streaming Transformer ASR with Blockwise Synchronous Inference

06/25/2020
by   Emiru Tsunoo, et al.
0

The Transformer self-attention network has recently shown promising performance as an alternative to recurrent neural networks in end-to-end (E2E) automatic speech recognition (ASR) systems. However, Transformer has a drawback in that the entire input sequence is required to compute self-attention. We have proposed a block processing method for the Transformer encoder by introducing a context-aware inheritance mechanism. An additional context embedding vector handed over from the previously processed block helps encode not only local acoustic information but also global linguistic, channel, and speaker attributes. In this paper, we extend block processing towards an entire streaming E2E ASR system without additional training, by introducing a blockwise synchronous decoding process inspired by a neural transducer into the Transformer decoder. We further apply a knowledge distillation technique with which training of the streaming Transformer is guided by the ordinary batch Transformer model. Evaluations of the HKUST and AISHELL-1 Mandarin tasks and LibriSpeech English task show that our proposed streaming Transformer outperforms conventional online approaches including monotonic chunkwise attention (MoChA). We also confirm that the knowledge distillation technique improves the accuracy further. Our streaming ASR models achieve comparable/superior performance to the batch models and other streaming-based transformer methods in all the tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

Towards Online End-to-end Transformer Automatic Speech Recognition

The Transformer self-attention network has recently shown promising perf...
research
03/17/2021

Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation

End-to-end automatic speech recognition (ASR), unlike conventional ASR, ...
research
10/16/2019

Transformer ASR with Contextual Block Processing

The Transformer self-attention network has recently shown promising perf...
research
03/11/2022

Transformer-based Streaming ASR with Cumulative Attention

In this paper, we propose an online attention mechanism, known as cumula...
research
03/29/2022

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR

Currently, there are mainly three Transformer encoder based streaming En...
research
02/07/2020

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

In this paper we present an end-to-end speech recognition model with Tra...
research
08/16/2023

Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

Millimeter wave (mmWave) based speech recognition provides more possibil...

Please sign up or login with your details

Forgot password? Click here to reset