Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

10/28/2019
by   Ching-Feng Yeh, et al.
0

We explore options to use Transformer networks in neural transducer for end-to-end speech recognition. Transformer networks use self-attention for sequence modeling and comes with advantages in parallel computation and capturing contexts. We propose 1) using VGGNet with causal convolution to incorporate positional information and reduce frame rate for efficient inference 2) using truncated self-attention to enable streaming for Transformer and reduce computational complexity. All experiments are conducted on the public LibriSpeech corpus. The proposed Transformer-Transducer outperforms neural transducer with LSTM/BLSTM networks and achieved word error rates of 6.37 streamable, compact with 45.7M parameters for the entire system, and computationally efficient with complexity of O(T), where T is input sequence length.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2020

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

Transformer models have been introduced into end-to-end speech recogniti...
research
03/29/2021

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Self-attention (SA), which encodes vector sequences according to their p...
research
10/23/2019

A Transformer with Interleaved Self-attention and Convolution for Hybrid Acoustic Models

Transformer with self-attention has achieved great success in the area o...
research
04/14/2021

Efficient conformer-based speech recognition with linear attention

Recently, conformer-based end-to-end automatic speech recognition, which...
research
10/23/2020

Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention

Recently, several studies reported that dot-product selfattention (SA) m...
research
01/29/2023

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Transformer-based deep neural networks have achieved great success in va...
research
06/01/2023

Faster Causal Attention Over Large Sequences Through Sparse Flash Attention

Transformer-based language models have found many diverse applications r...

Please sign up or login with your details

Forgot password? Click here to reset