Efficient conformer-based speech recognition with linear attention

04/14/2021
by   Shengqiang Li, et al.
0

Recently, conformer-based end-to-end automatic speech recognition, which outperforms recurrent neural network based ones, has received much attention. Although the parallel computing of conformer is more efficient than recurrent neural networks, the computational complexity of its dot-product self-attention is quadratic with respect to the length of the input feature. To reduce the computational complexity of the self-attention layer, we propose multi-head linear self-attention for the self-attention layer, which reduces its computational complexity to linear order. In addition, we propose to factorize the feed forward module of the conformer by low-rank matrix factorization, which successfully reduces the number of the parameters by approximate 50 little performance loss. The proposed model, named linear attention based conformer (LAC), can be trained and inferenced jointly with the connectionist temporal classification objective, which further improves the performance of LAC. To evaluate the effectiveness of LAC, we conduct experiments on the AISHELL-1 and LibriSpeech corpora. Results show that the proposed LAC achieves better performance than 7 recently proposed speech recognition models, and is competitive with the state-of-the-art conformer. Meanwhile, the proposed LAC has a number of parameters of only 50 speed than the latter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

We explore options to use Transformer networks in neural transducer for ...
research
11/22/2022

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

Several speech processing systems have demonstrated considerable perform...
research
02/17/2022

MLP-ASR: Sequence-length agnostic all-MLP architectures for speech recognition

We propose multi-layer perceptron (MLP)-based architectures suitable for...
research
03/23/2023

Pyramid Multi-branch Fusion DCNN with Multi-Head Self-Attention for Mandarin Speech Recognition

As one of the major branches of automatic speech recognition, attention-...
research
06/07/2020

FMA-ETA: Estimating Travel Time Entirely Based on FFN With Attention

Estimated time of arrival (ETA) is one of the most important services in...
research
07/13/2020

Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention

Speech recognition is a well developed research field so that the curren...
research
09/23/2020

Hamming OCR: A Locality Sensitive Hashing Neural Network for Scene Text Recognition

Recently, inspired by Transformer, self-attention-based scene text recog...

Please sign up or login with your details

Forgot password? Click here to reset