Capturing Multi-Resolution Context by Dilated Self-Attention

04/07/2021
by   Niko Moritz, et al.
0

Self-attention has become an important and widely used neural network component that helped to establish new state-of-the-art results for various applications, such as machine translation and automatic speech recognition (ASR). However, the computational complexity of self-attention grows quadratically with the input sequence length. This can be particularly problematic for applications such as ASR, where an input sequence generated from an utterance can be relatively long. In this work, we propose a combination of restricted self-attention and a dilation mechanism, which we refer to as dilated self-attention. The restricted self-attention allows attention to neighboring frames of the query at a high resolution, and the dilation mechanism summarizes distant information to allow attending to it with a lower resolution. Different methods for summarizing distant frames are studied, such as subsampling, mean-pooling, and attention-based pooling. ASR results demonstrate substantial improvements compared to restricted self-attention alone, achieving similar results compared to full-sequence based self-attention with a fraction of the computational costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/02/2021

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Attention-based end-to-end automatic speech recognition (ASR) systems ha...
10/28/2019

DFSMN-SAN with Persistent Memory Model for Automatic Speech Recognition

Self-attention networks (SAN) have been introduced into automatic speech...
04/13/2023

ASR: Attention-alike Structural Re-parameterization

The structural re-parameterization (SRP) technique is a novel deep learn...
07/12/2023

Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition

Modern speech recognition systems rely on self-attention. Unfortunately,...
03/25/2022

Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator

We present a novel multi-channel front-end based on channel shortening w...
12/05/2018

Summarizing Videos with Attention

In this work we propose a novel method for supervised, keyshots based vi...
03/06/2022

CNN self-attention voice activity detector

In this work we present a novel single-channel Voice Activity Detector (...

Please sign up or login with your details

Forgot password? Click here to reset