Transformer ASR with Contextual Block Processing

10/16/2019
by   Emiru Tsunoo, et al.
0

The Transformer self-attention network has recently shown promising performance as an alternative to recurrent neural networks (RNNs) in end-to-end (E2E) automatic speech recognition (ASR) systems. However, the Transformer has a drawback in that the entire input sequence is required to compute self-attention. In this paper, we propose a new block processing method for the Transformer encoder by introducing a context-aware inheritance mechanism. An additional context embedding vector handed over from the previously processed block helps to encode not only local acoustic information but also global linguistic, channel, and speaker attributes. We introduce a novel mask technique to implement the context inheritance to train the model efficiently. Evaluations of the Wall Street Journal (WSJ), Librispeech, VoxForge Italian, and AISHELL-1 Mandarin speech recognition datasets show that our proposed contextual block processing method outperforms naive block processing consistently. Furthermore, the attention weight tendency of each layer is analyzed to clarify how the added contextual inheritance mechanism models the global information.

READ FULL TEXT
research
10/25/2019

Towards Online End-to-end Transformer Automatic Speech Recognition

The Transformer self-attention network has recently shown promising perf...
research
06/25/2020

Streaming Transformer ASR with Blockwise Synchronous Inference

The Transformer self-attention network has recently shown promising perf...
research
11/05/2021

Context-Aware Transformer Transducer for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) systems often have d...
research
02/18/2019

Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping

Self-attention network, an attention-based feedforward neural network, h...
research
05/21/2023

CASA-ASR: Context-Aware Speaker-Attributed ASR

Recently, speaker-attributed automatic speech recognition (SA-ASR) has a...
research
02/09/2021

Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers

Although the lower layers of a deep neural network learn features which ...
research
11/14/2022

Towards A Unified Conformer Structure: from ASR to ASV Task

Transformer has achieved extraordinary performance in Natural Language P...

Please sign up or login with your details

Forgot password? Click here to reset