Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

09/30/2022
by   Chendong Zhao, et al.
0

The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). However, self-attention and multi-head attention cannot be easily applied for streaming or online ASR. For self-attention in Transformer ASR, the softmax normalization function-based attention mechanism makes it impossible to highlight important speech information. For multi-head attention in Transformer ASR, it is not easy to model monotonic alignments in different heads. To overcome these two limits, we integrate sparse attention and monotonic attention into Transformer-based ASR. The sparse mechanism introduces a learned sparsity scheme to enable each self-attention structure to fit the corresponding head better. The monotonic attention deploys regularization to prune redundant heads for the multi-head attention structure. The experiments show that our method can effectively improve the attention mechanism on widely used benchmarks of speech recognition.

READ FULL TEXT
research
06/17/2021

Efficient Conformer with Prob-Sparse Attention Mechanism for End-to-EndSpeech Recognition

End-to-end models are favored in automatic speech recognition (ASR) beca...
research
01/15/2020

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

Recently, Transformer has gained success in automatic speech recognition...
research
05/22/2023

GNCformer Enhanced Self-attention for Automatic Speech Recognition

In this paper,an Enhanced Self-Attention (ESA) mechanism has been put fo...
research
05/19/2020

Enhancing Monotonic Multihead Attention for Streaming ASR

We investigate a monotonic multihead attention (MMA) by extending hard m...
research
03/11/2022

Transformer-based Streaming ASR with Cumulative Attention

In this paper, we propose an online attention mechanism, known as cumula...
research
09/01/2022

Deep Sparse Conformer for Speech Recognition

Conformer has achieved impressive results in Automatic Speech Recognitio...
research
09/10/2021

Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition

When a sufficiently large far-field training data is presented, jointly ...

Please sign up or login with your details

Forgot password? Click here to reset