Self-Attention Aligner: A Latency-Control End-to-End Model for ASR Using Self-Attention Network and Chunk-Hopping

02/18/2019
by   Linhao Dong, et al.
0

Self-attention network, an attention-based feedforward neural network, has recently shown the potential to replace recurrent neural networks (RNNs) in a variety of NLP tasks. However, it is not clear if the self-attention network could be a good alternative of RNNs in automatic speech recognition (ASR), which processes the longer speech sequences and may have online recognition requirements. In this paper, we present a RNN-free end-to-end model: self-attention aligner (SAA), which applies the self-attention networks to a simplified recurrent neural aligner (RNA) framework. We also propose a chunk-hopping mechanism, which enables the SAA model to encode on segmented frame chunks one after another to support online recognition. Experiments on two Mandarin ASR datasets show the replacement of RNNs by the self-attention networks yields a 8.4 addition, the chunk-hopping mechanism allows the SAA to have only a 2.5 relative CER degradation with a 320ms latency. After jointly training with a self-attention network language model, our SAA model obtains further error rate reduction on multiple datasets. Especially, it achieves 24.12 Mandarin ASR benchmark (HKUST), exceeding the best end-to-end model by over 2 absolute CER.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) ha...
research
09/28/2019

Self-Attention Transducers for End-to-End Speech Recognition

Recurrent neural network transducers (RNN-T) have been successfully appl...
research
12/29/2020

Multiple Structural Priors Guided Self Attention Network for Language Understanding

Self attention networks (SANs) have been widely utilized in recent NLP s...
research
06/09/2019

Attention-based Conditioning Methods for External Knowledge Integration

In this paper, we present a novel approach for incorporating external kn...
research
04/23/2018

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

Current end-to-end machine reading and question answering (Q&A) models a...
research
10/16/2019

Transformer ASR with Contextual Block Processing

The Transformer self-attention network has recently shown promising perf...
research
03/25/2022

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

We present a novel multi-channel front-end based on channel shortening w...

Please sign up or login with your details

Forgot password? Click here to reset