Multi-Head State Space Model for Speech Recognition

05/21/2023
by   Yassir Fathullah, et al.
0

State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in replacement for multi-head attention in transformer encoders, this new model significantly outperforms the transformer transducer on the LibriSpeech speech recognition corpus. Furthermore, we augment the transformer block with MH-SSMs layers, referred to as the Stateformer, achieving state-of-the-art performance on the LibriSpeech task, with word error rates of 1.76%/4.37% on the development and 1.91%/4.36% on the test sets without using an external language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Structured State Space Decoder for Speech Recognition and Synthesis

Automatic speech recognition (ASR) systems developed in recent years hav...
research
05/16/2020

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models h...
research
05/29/2023

HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition

State-of-the-art ASR systems have achieved promising results by modeling...
research
10/23/2020

Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention

Recently, several studies reported that dot-product selfattention (SA) m...
research
06/07/2019

Analyzing the Structure of Attention in a Transformer Language Model

The Transformer is a fully attention-based alternative to recurrent netw...
research
02/27/2023

Diagonal State Space Augmented Transformers for Speech Recognition

We improve on the popular conformer architecture by replacing the depthw...
research
07/24/2022

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Recently Convolution-augmented Transformer (Conformer) has shown promisi...

Please sign up or login with your details

Forgot password? Click here to reset