Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition

07/12/2023
by   Titouan Parcollet, et al.
0

Modern speech recognition systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference as well as training and increasing memory consumption. Cheaper alternatives to self-attention for ASR have been developed, but fail to consistently reach the same level of accuracy. In practice, however, the self-attention weights of trained speech recognizers take the form of a global average over time. This paper, therefore, proposes a linear-time alternative to self-attention for speech recognition. It summarises a whole utterance with the mean over vectors for all time steps. This single summary is then combined with time-specific information. We call this method “Summary Mixing”. Introducing Summary Mixing in state-of-the-art ASR models makes it feasible to preserve or exceed previous speech recognition performance while lowering the training and inference times by up to 27 memory budget by a factor of two.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Self-attention networks (SAN) have been introduced into automatic speech...
research
02/23/2021

Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition

Self-attention models have been successfully applied in end-to-end speec...
research
04/07/2021

Capturing Multi-Resolution Context by Dilated Self-Attention

Self-attention has become an important and widely used neural network co...
research
03/22/2023

Self-supervised Learning with Speech Modulation Dropout

We show that training a multi-headed self-attention-based deep network t...
research
11/28/2022

FsaNet: Frequency Self-attention for Semantic Segmentation

Considering the spectral properties of images, we propose a new self-att...
research
02/24/2022

Self-Attention for Incomplete Utterance Rewriting

Incomplete utterance rewriting (IUR) has recently become an essential ta...
research
03/19/2022

Similarity and Content-based Phonetic Self Attention for Speech Recognition

Transformer-based speech recognition models have achieved great success ...

Please sign up or login with your details

Forgot password? Click here to reset